Hi Alexei,
On Tue, Apr 1, 2014 at 5:29 AM, Alexei Starovoitov <[email protected]> wrote:
> On Mon, Mar 31, 2014 at 3:01 AM, Jovi Zhangwei <[email protected]> wrote:
>> Hi Ingo,
>>
>> On Mon, Mar 31, 2014 at 3:17 PM, Ingo Molnar <[email protected]> wrote:
>>>
>>> * Jovi Zhangwei <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> The following set of patches add ktap tracing tool.
>>>>
>>>> ktap is a new script-based dynamic tracing tool for Linux.
>>>> It uses a scripting language and lets the user trace system dynamically.
>>>>
>>>> Highlights features:
>>>> * a simple but powerful scripting language
>>>> * register-based interpreter (heavily optimized) in Linux kernel
>>>> * small and lightweight
>>>> * not depend on the GCC toolchain for each script run
>>>> * easy to use in embedded environments without debugging info
>>>> * support for tracepoint, kprobe, uprobe, function trace, timer, and more
>>>> * supported in x86, ARM, PowerPC, MIPS
>>>> * safety in sandbox
>>>
>>> I've asked this fundamental design question before but got no full
>>> answer: how does ktap compare to the ongoing effort of improving the
>>> BPF scripting engine?
>>>
>>
>> From long experiences of ktap development, what make me really
>> love ktap is:
>>
>> 1) Availability
>> ktap is only available tool to use in small embedded platform, stap
>> and BPF both need GCC now, stap have its own language, so it's much
>> better than BPF.
>> (IMO there may need several years to complete a skeleton of dynamic
>> tracing script language, see stap and dtrace)
>>
>> 2) Simplicity
>> ktap is simplest dynamic scripting trace solution now in Linux world,
>> compare with stap/dtrace/BPF.
>> a). It have simple syntax which make many people like it, it have
>> b). It have simple associate array, make dynamic tracing powerful.
>> c). It have a simple compiler which only have 87K in x86_64.
>> d). It have a simple tracing syntax which constant with perf events.
>>
>> 3) Safety
>> ktap already delivered its safety to end user, many people use ktap
>> in their dev lab to investigate problem.
>> But BPF need time to prove its safety, especially proved by end user,
>> and IMO BPF safety check would be more complex if the runtime
>> support more features as time goes.
>
> safety of ktap is arguable.
>
> 1.
> From the diff it seems that 'loop_count' is a dynamic way of
> checking that loops are not infinite, but max_loop_count = 100000
> if loop body has many instructions, such large count may trigger
> hung_task panic.
>
Actually I'm planing use time-based time to avoid this, minor issue.
> 2.
> jumps are not counted, so if userspace makes an error and loads
> ktap bytecode with wrong jumps, the interpreter will hang.
>
There leave a todo in validation code, as kernel developers don't like
many todo in there, so I will also address it.
> 3.
> recursive functions and f1()->f2()->f1() are not detected either.
> Another possible hang?
>
No, it will exit by ktap stack overflow check.
> 4.
> bc_[ft]new instruction are allocating memory and garbage collector
> suppose to free things when ktap module is unloaded, right?
> since max_loop_cnt is 100k, a script can allocate quite a bit of memory
> and kernel will be waiting for userspace trigger to free it?
> Sounds dangerous.
>
There will have table/function number limitation, so this is not a problem.
> These concerns are just from quick code review.
>
>> 4). Samples
>> Many people like those ktap samples, ktap shows the attractive by
>> samples.
>>
>> Even I so love ktap and would like share ktap values to everyone, but in
>> technical point of view, I still agree with you that there should have
>> unified scripting engines in kernel if that engine can service for many
>> domains(like networking), but that solution should show its availability/
>> simplicity/safety firstly to user, not just proved by end user.
>>
>> Dynamic tracing scripting environment should contains:
>> simple compiler, clean language syntax, fast script engine,
>> associative array, aggregation, kstack, ustack, event management,
>> ring buffer, samples, tapset/library, CTF, etc.
>>
>> ktap already fixed most of these issues by its simple design, but
>> BPF only have "script engine" part(its associative array still cannot
>> vmalloc), which have long road before could use by end user.
>
> 'internal bpf' instruction set is an assembler instruction set.
> Low level just like x86 instruction set.
> It doesn't have vmalloc instruction and shouldn't have.
> 'internal bpf' program can theoretically make a call to allocate
> memory, but I don't think it's safe to let loadable programs to
> arbitrarily allocate memory.
> It's a matter of ownership of the memory.
> If script can allocate and receive a pointer to memory,
> the script owns that memory and kernel cannot touch it until
> script does 'free' or terminates and GC kicks in.
Wrong, ktap don't have vmalloc instruction, ktap only use
vmalloc for table and memory pool pre-allocation.
> ktap can be invoked through timers, so this dynamically
> allocated tables may be living for long time affecting the whole
> system. The tracing tool should be safer than that.
>
Wrong again, ktap table cannot be allocated in timer context.
>> ktap is not just bring a bytecode engine, it bring a complete simple
>> dynamic tracing environment to end user, it bring clean language syntax,
>> samples, flexible table, perf like event management, etc, those is the
>> key part to end user, not bytecode engine, so if we can develop simple
>> BPF compiler with similar ktap syntax in some day, then we can replace
>> kp_lex.c/kp_parse.c/kp_vm.c, and there have zero reason why other
>> parts cannot be shared(associative array, aggregation, kstack, ustack,
>> event management, ring buffer, samples, tapset/library, CTF, etc).
>
> I think nothing stops ktap userspace to parse ktap language
> and generate 'internal bpf' format. gcc is unnecessary here.
It's a big engineering problem, BPF bytecode is too low level,
BPF engine exposed too much low level stuff to end user, see bpf example:
void dropmon(struct bpf_context *ctx) {
void *loc;
uint64_t *drop_cnt;
loc = (void *)ctx->arg2;
drop_cnt = bpf_table_lookup(ctx, 0, &loc);
if (drop_cnt) {
__sync_fetch_and_add(drop_cnt, 1);
} else {
uint64_t init = 0;
bpf_table_update(ctx, 0, &loc, &init);
}
}
IMO there have many issues in this simple script.
If user forget add drop_cnt check, what will
happen, it will reference NULL pointer in __sync_fetch_and_add.
How to make sure drop_cn pointer is a valid memory address in table,
not other kernel memory allocation?
Look bpf_table_update function, if bpf table overflow, there have
no way to stop script executing in there, which make completely
wrong things, so you have to add exit condition checking after
bpf_table_update(and maybe most C function calls).
And obviously you missed add table lock/unlock in there.
In contrast, look ktap script with same functionality:
var s ={}
trace skb:kfree_skb {
s[arg2] += 1
}
User don't need to handle error checking and table lock issue at all,
both in source level and bytecode level.
>From end user point of view, they want clean language syntax like
above ktap example, so if bpf have same dynamic tracing goal, it
should follow this way.
BPF is good, but have many engineering problem to solve to
became usable dynamic tracing solution.
> I personally think that tracing scripts in C are more readable,
> but that's minor.
There will have no one end user use your dropmon.c in they
dynamic tracing environment.
> But before we go about generating either 'internal bpf' or any
> other format, we need to discuss safe scripting design principles.
> We already have systemtap that is relying on userspace for verification.
> If we want a real alternative to systemtap, kernel should take care
> of safety.
>
> Note I'm not proposing to expose 'internal bpf' to userspace in uapi
> headers and I think ktap shouldn't do it either.
> Kernel hosted userspace component (like perf) that uses
> kernel specific headers allows for much cleaner interfaces
> without creating 'forever maintain' headache.
>
Hmm, Looks reasonable.
Thanks.
Jovi
On Mon, Mar 31, 2014 at 9:47 PM, Jovi Zhangwei <[email protected]> wrote:
> Hi Alexei,
>
> On Tue, Apr 1, 2014 at 5:29 AM, Alexei Starovoitov <[email protected]> wrote:
>> On Mon, Mar 31, 2014 at 3:01 AM, Jovi Zhangwei <[email protected]> wrote:
>>> Hi Ingo,
>>>
>>> On Mon, Mar 31, 2014 at 3:17 PM, Ingo Molnar <[email protected]> wrote:
>>>>
>>>> * Jovi Zhangwei <[email protected]> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> The following set of patches add ktap tracing tool.
>>>>>
>>>>> ktap is a new script-based dynamic tracing tool for Linux.
>>>>> It uses a scripting language and lets the user trace system dynamically.
>>>>>
>>>>> Highlights features:
>>>>> * a simple but powerful scripting language
>>>>> * register-based interpreter (heavily optimized) in Linux kernel
>>>>> * small and lightweight
>>>>> * not depend on the GCC toolchain for each script run
>>>>> * easy to use in embedded environments without debugging info
>>>>> * support for tracepoint, kprobe, uprobe, function trace, timer, and more
>>>>> * supported in x86, ARM, PowerPC, MIPS
>>>>> * safety in sandbox
>>>>
>>>> I've asked this fundamental design question before but got no full
>>>> answer: how does ktap compare to the ongoing effort of improving the
>>>> BPF scripting engine?
>>>>
>>>
>>> From long experiences of ktap development, what make me really
>>> love ktap is:
>>>
>>> 1) Availability
>>> ktap is only available tool to use in small embedded platform, stap
>>> and BPF both need GCC now, stap have its own language, so it's much
>>> better than BPF.
>>> (IMO there may need several years to complete a skeleton of dynamic
>>> tracing script language, see stap and dtrace)
>>>
>>> 2) Simplicity
>>> ktap is simplest dynamic scripting trace solution now in Linux world,
>>> compare with stap/dtrace/BPF.
>>> a). It have simple syntax which make many people like it, it have
>>> b). It have simple associate array, make dynamic tracing powerful.
>>> c). It have a simple compiler which only have 87K in x86_64.
>>> d). It have a simple tracing syntax which constant with perf events.
>>>
>>> 3) Safety
>>> ktap already delivered its safety to end user, many people use ktap
>>> in their dev lab to investigate problem.
>>> But BPF need time to prove its safety, especially proved by end user,
>>> and IMO BPF safety check would be more complex if the runtime
>>> support more features as time goes.
>>
>> safety of ktap is arguable.
>>
>> 1.
>> From the diff it seems that 'loop_count' is a dynamic way of
>> checking that loops are not infinite, but max_loop_count = 100000
>> if loop body has many instructions, such large count may trigger
>> hung_task panic.
>>
> Actually I'm planing use time-based time to avoid this, minor issue.
>
>> 2.
>> jumps are not counted, so if userspace makes an error and loads
>> ktap bytecode with wrong jumps, the interpreter will hang.
>>
> There leave a todo in validation code, as kernel developers don't like
> many todo in there, so I will also address it.
>
>> 3.
>> recursive functions and f1()->f2()->f1() are not detected either.
>> Another possible hang?
>>
> No, it will exit by ktap stack overflow check.
>
>> 4.
>> bc_[ft]new instruction are allocating memory and garbage collector
>> suppose to free things when ktap module is unloaded, right?
>> since max_loop_cnt is 100k, a script can allocate quite a bit of memory
>> and kernel will be waiting for userspace trigger to free it?
>> Sounds dangerous.
>>
> There will have table/function number limitation, so this is not a problem.
>
>> These concerns are just from quick code review.
ok. spotted few more things:
5.
bc_callt (ktap tailcall) doesn't have loop_count check.
so tailcall can loop forever.
Of course you can fix it with elapsed time check at every branch
or call instruction.
6.
do_bc_kstr doesn't check that 'd' value is valid and goes kbase[~d]
Can be fixed of course.
7.
uget/uset and others seem to have similar problems.
It seems that your definition of 'safe ktap' is that user cannot break
kernel if he uses ktap scripting syntax.
In that sense ktap is not much different from stap.
Overall it seems you view ktap bytecode as a continuation
of ktap syntax.
ktap language allows to read pid,uid,tid, so they were added as
separate instructions to ktap bytecode...
ktap allows dump of a table, so kernel has to do tab_histdump()
including sorting of fields and printf formatting.
What if ktap user wants a different table dump?
or new features from the language?
keep extending bytecode for every printf tweak is not a great solution.
I think design approach to ktap needs to change.
What I'm proposing is the following:
- keep ktap syntax as-is, but remove loops
- ktap style of accessing tables is definitely less verbose then C,
so keep it, but don't let compiled program to own the memory
- keep table dump as-is, but do it in userspace instead
In other words compiler for ktap scripts can generate kernel program
and userspace program at the same time.
the end users won't notice the difference vs what you have now.
we should learn from BPF design mistake:
BPF was
- user space interface
- safe instruction set
- execution engine
all at the same time and it was hard to extend it, since all
aspects need to be considered.
We need to break this dependency.
'internal bpf' is an execution engine.
it's a low level assembler language like x86.
Think of it as renamed x86 assembler, where registers
are called r1, r2 instead of rdi, rsi.
safety comes from verifier which is decoupled from execution.
It can allow complex program or very dumb ones.
Today classic bpf is still used as kernel-user interface,
it goes through bpf checker and converted to 'internal bpf'
for faster execution.
In this case bpf checker allows all existing bpf programs,
but ibpf execution engine can do a lot more.
ktap can follow similar approach.
Though I think C as a language to express filters is simpler,
ktap syntax is fine as well.
ktap compiler can generate ibpf instructions and let
kernel verify them.
ibpf verifier that I've posted earlier has enough knobs
to be used as very strict or permissive depending on the
kernel component, while both being safe from 'not crashing
or hanging kernel' point of view.
Like loops are always disallowed, all memory/register
accesses must be valid, data and control dependency
between instructions are checked.
Best part is that ktap syntax and features can evolve
without ever touching execution engine.
> Wrong, ktap don't have vmalloc instruction, ktap only use
> vmalloc for table and memory pool pre-allocation.
allowing script to own allocated memory is where we diverge
on the approach to safety.
if script can loop or allocate memory, you'd need to dynamically
track elapsed time, all allocated memory and all read/write
accesses from the program, so execution engine slows
down and becomes enforcer of safety.
Every new instruction in such engine needs to be considered
from safety point of view. The same problem plagued old bpf.
> It's a big engineering problem, BPF bytecode is too low level,
> BPF engine exposed too much low level stuff to end user, see bpf example:
you're mixing layers here.
'internal bpf' is a low level execution engine.
Nothing prevents userspace to have ktap or C or any other syntax.
> void dropmon(struct bpf_context *ctx) {
> void *loc;
> uint64_t *drop_cnt;
>
> loc = (void *)ctx->arg2;
>
> drop_cnt = bpf_table_lookup(ctx, 0, &loc);
> if (drop_cnt) {
> __sync_fetch_and_add(drop_cnt, 1);
> } else {
> uint64_t init = 0;
> bpf_table_update(ctx, 0, &loc, &init);
> }
> }
>
> IMO there have many issues in this simple script.
>
> If user forget add drop_cnt check, what will
> happen, it will reference NULL pointer in __sync_fetch_and_add.
> How to make sure drop_cn pointer is a valid memory address in table,
> not other kernel memory allocation?
good question :)
The way verifier guarantees correctness is the following.
'bpf_table_lookup' is annotated as 'returns valid memory of size X
or NULL', so verifiers follows that value in a register through control
flow graph. In if(drop_cnt) branch, the drop_cnt is valid memory.
In else branch, drop_cnt is null.
I've explained it in better details in verifier patch and the doc.
> Look bpf_table_update function, if bpf table overflow, there have
> no way to stop script executing in there, which make completely
> wrong things, so you have to add exit condition checking after
> bpf_table_update(and maybe most C function calls).
if you think 'table overflow' notification should be hidden
from script writer, then go for it.
ktap can generate ibpf that does bpf_table_update and
aborts the script if the limit is hit.
All these decisions are up to userspace and language compiler.
> And obviously you missed add table lock/unlock in there.
good question :)
It's actually under rcu which is much faster then lock/unlock.
> In contrast, look ktap script with same functionality:
>
> var s ={}
>
> trace skb:kfree_skb {
> s[arg2] += 1
> }
>
> User don't need to handle error checking and table lock issue at all,
> both in source level and bytecode level.
agree that ktap syntax is less verbose as C.
bytecode is a different story.
> From end user point of view, they want clean language syntax like
> above ktap example, so if bpf have same dynamic tracing goal, it
> should follow this way.
sure. keep ktap syntax. The users should have multiple
choices to write their scripts.
Regards,
Alexei
On Wed, Apr 2, 2014 at 12:57 PM, Alexei Starovoitov <[email protected]> wrote:
> On Mon, Mar 31, 2014 at 9:47 PM, Jovi Zhangwei <[email protected]> wrote:
>> Hi Alexei,
>>
>> On Tue, Apr 1, 2014 at 5:29 AM, Alexei Starovoitov <[email protected]> wrote:
>>> On Mon, Mar 31, 2014 at 3:01 AM, Jovi Zhangwei <[email protected]> wrote:
>>>> Hi Ingo,
>>>>
>>>> On Mon, Mar 31, 2014 at 3:17 PM, Ingo Molnar <[email protected]> wrote:
>>>>>
>>>>> * Jovi Zhangwei <[email protected]> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> The following set of patches add ktap tracing tool.
>>>>>>
>>>>>> ktap is a new script-based dynamic tracing tool for Linux.
>>>>>> It uses a scripting language and lets the user trace system dynamically.
>>>>>>
>>>>>> Highlights features:
>>>>>> * a simple but powerful scripting language
>>>>>> * register-based interpreter (heavily optimized) in Linux kernel
>>>>>> * small and lightweight
>>>>>> * not depend on the GCC toolchain for each script run
>>>>>> * easy to use in embedded environments without debugging info
>>>>>> * support for tracepoint, kprobe, uprobe, function trace, timer, and more
>>>>>> * supported in x86, ARM, PowerPC, MIPS
>>>>>> * safety in sandbox
>>>>>
>>>>> I've asked this fundamental design question before but got no full
>>>>> answer: how does ktap compare to the ongoing effort of improving the
>>>>> BPF scripting engine?
>>>>>
>>>>
>>>> From long experiences of ktap development, what make me really
>>>> love ktap is:
>>>>
>>>> 1) Availability
>>>> ktap is only available tool to use in small embedded platform, stap
>>>> and BPF both need GCC now, stap have its own language, so it's much
>>>> better than BPF.
>>>> (IMO there may need several years to complete a skeleton of dynamic
>>>> tracing script language, see stap and dtrace)
>>>>
>>>> 2) Simplicity
>>>> ktap is simplest dynamic scripting trace solution now in Linux world,
>>>> compare with stap/dtrace/BPF.
>>>> a). It have simple syntax which make many people like it, it have
>>>> b). It have simple associate array, make dynamic tracing powerful.
>>>> c). It have a simple compiler which only have 87K in x86_64.
>>>> d). It have a simple tracing syntax which constant with perf events.
>>>>
>>>> 3) Safety
>>>> ktap already delivered its safety to end user, many people use ktap
>>>> in their dev lab to investigate problem.
>>>> But BPF need time to prove its safety, especially proved by end user,
>>>> and IMO BPF safety check would be more complex if the runtime
>>>> support more features as time goes.
>>>
>>> safety of ktap is arguable.
>>>
>>> 1.
>>> From the diff it seems that 'loop_count' is a dynamic way of
>>> checking that loops are not infinite, but max_loop_count = 100000
>>> if loop body has many instructions, such large count may trigger
>>> hung_task panic.
>>>
>> Actually I'm planing use time-based time to avoid this, minor issue.
>>
>>> 2.
>>> jumps are not counted, so if userspace makes an error and loads
>>> ktap bytecode with wrong jumps, the interpreter will hang.
>>>
>> There leave a todo in validation code, as kernel developers don't like
>> many todo in there, so I will also address it.
>>
>>> 3.
>>> recursive functions and f1()->f2()->f1() are not detected either.
>>> Another possible hang?
>>>
>> No, it will exit by ktap stack overflow check.
>>
>>> 4.
>>> bc_[ft]new instruction are allocating memory and garbage collector
>>> suppose to free things when ktap module is unloaded, right?
>>> since max_loop_cnt is 100k, a script can allocate quite a bit of memory
>>> and kernel will be waiting for userspace trigger to free it?
>>> Sounds dangerous.
>>>
>> There will have table/function number limitation, so this is not a problem.
>>
>>> These concerns are just from quick code review.
>
> ok. spotted few more things:
>
> 5.
> bc_callt (ktap tailcall) doesn't have loop_count check.
> so tailcall can loop forever.
> Of course you can fix it with elapsed time check at every branch
> or call instruction.
>
> 6.
> do_bc_kstr doesn't check that 'd' value is valid and goes kbase[~d]
> Can be fixed of course.
>
> 7.
> uget/uset and others seem to have similar problems.
>
> It seems that your definition of 'safe ktap' is that user cannot break
> kernel if he uses ktap scripting syntax.
> In that sense ktap is not much different from stap.
>
Definitely not.
Safety don't means bound check which you listed above, ktap have
safety on bytecode level in design, not syntax, it means whatever
you do on the bytecode, it will never crash kernel, please see
more in ktapvm.
> Overall it seems you view ktap bytecode as a continuation
> of ktap syntax.
> ktap language allows to read pid,uid,tid, so they were added as
> separate instructions to ktap bytecode...
> ktap allows dump of a table, so kernel has to do tab_histdump()
> including sorting of fields and printf formatting.
> What if ktap user wants a different table dump?
> or new features from the language?
> keep extending bytecode for every printf tweak is not a great solution.
>
> I think design approach to ktap needs to change.
> What I'm proposing is the following:
> - keep ktap syntax as-is, but remove loops
> - ktap style of accessing tables is definitely less verbose then C,
> so keep it, but don't let compiled program to own the memory
> - keep table dump as-is, but do it in userspace instead
>
You statements is conflicts in here.
If move table dump into userspace, then your compiler must need
to support loop, but you also suggested to kill loop in compiler.
> In other words compiler for ktap scripts can generate kernel program
> and userspace program at the same time.
> the end users won't notice the difference vs what you have now.
>
> we should learn from BPF design mistake:
> BPF was
> - user space interface
> - safe instruction set
> - execution engine
> all at the same time and it was hard to extend it, since all
> aspects need to be considered.
>
> We need to break this dependency.
> 'internal bpf' is an execution engine.
> it's a low level assembler language like x86.
> Think of it as renamed x86 assembler, where registers
> are called r1, r2 instead of rdi, rsi.
>
> safety comes from verifier which is decoupled from execution.
> It can allow complex program or very dumb ones.
> Today classic bpf is still used as kernel-user interface,
> it goes through bpf checker and converted to 'internal bpf'
> for faster execution.
> In this case bpf checker allows all existing bpf programs,
> but ibpf execution engine can do a lot more.
>
> ktap can follow similar approach.
> Though I think C as a language to express filters is simpler,
> ktap syntax is fine as well.
> ktap compiler can generate ibpf instructions and let
> kernel verify them.
> ibpf verifier that I've posted earlier has enough knobs
> to be used as very strict or permissive depending on the
> kernel component, while both being safe from 'not crashing
> or hanging kernel' point of view.
> Like loops are always disallowed, all memory/register
> accesses must be valid, data and control dependency
> between instructions are checked.
>
> Best part is that ktap syntax and features can evolve
> without ever touching execution engine.
>
>> Wrong, ktap don't have vmalloc instruction, ktap only use
>> vmalloc for table and memory pool pre-allocation.
>
> allowing script to own allocated memory is where we diverge
> on the approach to safety.
> if script can loop or allocate memory, you'd need to dynamically
> track elapsed time, all allocated memory and all read/write
> accesses from the program, so execution engine slows
> down and becomes enforcer of safety.
> Every new instruction in such engine needs to be considered
> from safety point of view. The same problem plagued old bpf.
>
>> It's a big engineering problem, BPF bytecode is too low level,
>> BPF engine exposed too much low level stuff to end user, see bpf example:
>
> you're mixing layers here.
> 'internal bpf' is a low level execution engine.
> Nothing prevents userspace to have ktap or C or any other syntax.
>
>> void dropmon(struct bpf_context *ctx) {
>> void *loc;
>> uint64_t *drop_cnt;
>>
>> loc = (void *)ctx->arg2;
>>
>> drop_cnt = bpf_table_lookup(ctx, 0, &loc);
>> if (drop_cnt) {
>> __sync_fetch_and_add(drop_cnt, 1);
>> } else {
>> uint64_t init = 0;
>> bpf_table_update(ctx, 0, &loc, &init);
>> }
>> }
>>
>> IMO there have many issues in this simple script.
>>
>> If user forget add drop_cnt check, what will
>> happen, it will reference NULL pointer in __sync_fetch_and_add.
>> How to make sure drop_cn pointer is a valid memory address in table,
>> not other kernel memory allocation?
>
> good question :)
> The way verifier guarantees correctness is the following.
> 'bpf_table_lookup' is annotated as 'returns valid memory of size X
> or NULL', so verifiers follows that value in a register through control
> flow graph. In if(drop_cnt) branch, the drop_cnt is valid memory.
> In else branch, drop_cnt is null.
> I've explained it in better details in verifier patch and the doc.
>
>> Look bpf_table_update function, if bpf table overflow, there have
>> no way to stop script executing in there, which make completely
>> wrong things, so you have to add exit condition checking after
>> bpf_table_update(and maybe most C function calls).
>
> if you think 'table overflow' notification should be hidden
> from script writer, then go for it.
> ktap can generate ibpf that does bpf_table_update and
> aborts the script if the limit is hit.
> All these decisions are up to userspace and language compiler.
>
>> And obviously you missed add table lock/unlock in there.
>
> good question :)
> It's actually under rcu which is much faster then lock/unlock.
>
Not sure it's really a good idea to protect table operation by RCU
in probe context.
>> In contrast, look ktap script with same functionality:
>>
>> var s ={}
>>
>> trace skb:kfree_skb {
>> s[arg2] += 1
>> }
>>
>> User don't need to handle error checking and table lock issue at all,
>> both in source level and bytecode level.
>
> agree that ktap syntax is less verbose as C.
> bytecode is a different story.
>
>> From end user point of view, they want clean language syntax like
>> above ktap example, so if bpf have same dynamic tracing goal, it
>> should follow this way.
>
> sure. keep ktap syntax. The users should have multiple
> choices to write their scripts.
>
Ok, we discussed a lot about ktap vs. ebpf, so what comes into my
mind is:
1). ktap could generate more low-level bytecode than now.
Basically I agree on this, that's would be faster and make ktapvm
more lightweight, that's what I want to see.
So I think ktap bytecode engine maybe can integrate with ebpf
in some day, but also maybe need some changes on ebpf part,
not only ktap part.
2) keeps ktap simplicity and flexibility
The simplicity and flexibility is the great value of ktap, that's the
reason why many people like it.
3) Evolve ktap features independently with bytecode engine
The ktap features in here includes:
clean syntax, associative array, aggregation, event management,
timer, resource management, kstack, ustack, ring buffer,
tapset/library, CTF, samples, etc.
All these tracing features should evolve independently with
bytecode engine.
Actually this is what ktap does now, the bytecode engine is very
independent, but will be decouples more in future.
This is important because we can change bytecode engine
smoothly without broken any existing features.
Basically I think the relation between ktap and ebpf is not contention,
but complement.
So based on all these input, I suggest:
Put all these community efforts together, figure out the proper design
implementation of dynamic tracing tool, ktap can be a good start to
build upon it, evolve to a unified kernel script engine with ebpf together,
finally service for dynamic tracing and network(if possible).
Our goal is same and very clearly, we really want a "simple & flexible
& safe" dynamic scripting tracing tool for Linux, which could compare
or even better than Dtrace, this is the motivation of ktap project.
Two solution may be take:
1). upstream ktap into core trace and evolve it step by step, and finally
make a integrated bytecode engine, it's a long process, but I think
it's worth.
2). move ktap back into staging, and graduate from staging after the
code make tracing people and ebpf people both happy.
The benefit is the process will be under the eyes of community.
Ingo, steven, Greg, what do you think?
Thanks.
Jovi
* Jovi Zhangwei <[email protected]> wrote:
> > I think nothing stops ktap userspace to parse ktap language and
> > generate 'internal bpf' format. gcc is unnecessary here.
>
> It's a big engineering problem, [...]
Sorry, but it will become an even bigger engineering problem if it's
merged to the upstream kernel tree!
You need to solve known design and implementational issues before we
can even think about any upstream merge.
Thanks,
Ingo
* Alexei Starovoitov <[email protected]> wrote:
> [...]
>
> It seems that your definition of 'safe ktap' is that user cannot break
> kernel if he uses ktap scripting syntax.
> In that sense ktap is not much different from stap.
>
> Overall it seems you view ktap bytecode as a continuation
> of ktap syntax.
> ktap language allows to read pid,uid,tid, so they were added as
> separate instructions to ktap bytecode...
> ktap allows dump of a table, so kernel has to do tab_histdump()
> including sorting of fields and printf formatting.
> What if ktap user wants a different table dump?
> or new features from the language?
> keep extending bytecode for every printf tweak is not a great solution.
>
> I think design approach to ktap needs to change.
> What I'm proposing is the following:
> - keep ktap syntax as-is, but remove loops
> - ktap style of accessing tables is definitely less verbose then C,
> so keep it, but don't let compiled program to own the memory
> - keep table dump as-is, but do it in userspace instead
I'd suggest using C syntax instead initially, because that's what the
kernel is using.
The overwhelming majority of people probing the kernel are
programmers, so there's no point in inventing new syntax, we should
reuse existing syntax!
That is one reason why for example the (very simple!) ftrace filter
language tries to mimic C syntax.
Especially as C is simpler for an important category, filters:
> Though I think C as a language to express filters is simpler,
> ktap syntax is fine as well.
Thanks,
Ingo
* Jovi Zhangwei <[email protected]> wrote:
> So based on all these input, I suggest:
>
> Put all these community efforts together, figure out the proper
> design implementation of dynamic tracing tool, ktap can be a good
> start to build upon it, evolve to a unified kernel script engine
> with ebpf together, finally service for dynamic tracing and
> network(if possible).
>
> Our goal is same and very clearly, we really want a "simple &
> flexible & safe" dynamic scripting tracing tool for Linux, which
> could compare or even better than Dtrace, this is the motivation of
> ktap project.
>
> Two solution may be take:
>
> 1). upstream ktap into core trace and evolve it step by step, and
> finally make a integrated bytecode engine, it's a long process,
> but I think it's worth.
>
> 2). move ktap back into staging, and graduate from staging after the
> code make tracing people and ebpf people both happy.
>
> The benefit is the process will be under the eyes of community.
>
> Ingo, steven, Greg, what do you think?
For now I'm opting for a third option:
3) Maintain my NAK on the ktap patches until they address the
fundamental design concerns outlined by Alexei and others in
their review feedback:
NAKed-by: Ingo Molnar <[email protected]>
The thing is, I've outlined some of the concerns in my previous
review. Not much happened on that front, for example ktap did not get
any closer in integrating with BPF. Many months have passed since the
previous ktap submission, still I see no progress on the 'design'
front. That really needs to change.
Please keep me Cc:-ed to any and all future ktap submissions so I can
monitor ktap's progress and lift the NAK if the design concerns have
been addressed.
Thanks,
Ingo
On Wed, Apr 2, 2014 at 3:43 PM, Ingo Molnar <[email protected]> wrote:
>
> * Jovi Zhangwei <[email protected]> wrote:
>
>> So based on all these input, I suggest:
>>
>> Put all these community efforts together, figure out the proper
>> design implementation of dynamic tracing tool, ktap can be a good
>> start to build upon it, evolve to a unified kernel script engine
>> with ebpf together, finally service for dynamic tracing and
>> network(if possible).
>>
>> Our goal is same and very clearly, we really want a "simple &
>> flexible & safe" dynamic scripting tracing tool for Linux, which
>> could compare or even better than Dtrace, this is the motivation of
>> ktap project.
>>
>> Two solution may be take:
>>
>> 1). upstream ktap into core trace and evolve it step by step, and
>> finally make a integrated bytecode engine, it's a long process,
>> but I think it's worth.
>>
>> 2). move ktap back into staging, and graduate from staging after the
>> code make tracing people and ebpf people both happy.
>>
>> The benefit is the process will be under the eyes of community.
>>
>> Ingo, steven, Greg, what do you think?
>
> For now I'm opting for a third option:
>
> 3) Maintain my NAK on the ktap patches until they address the
> fundamental design concerns outlined by Alexei and others in
> their review feedback:
>
There is no fundamental design concerns outlined about ktap, that
'loop' in ktap design is not fundamental, others review feedback
is focus on patch, not fundamental design, and those "safety"
concerns raised by Alexei is just bound checking.
I don't see any others.
> NAKed-by: Ingo Molnar <[email protected]>
>
> The thing is, I've outlined some of the concerns in my previous
> review. Not much happened on that front, for example ktap did not get
> any closer in integrating with BPF.
>
I don't see any suggestion about integrating BPF before this review cycle.
> Many months have passed since the
> previous ktap submission, still I see no progress on the 'design'
> front. That really needs to change.
>
I think the change need both in ktap and ebpf, not ktap one side,
if you really want integrate ktap and BPF is a way.
There have no proven that current ebpf low level bytecode design is suit
for generic dynamic tracing use, I also raised some design concern
about ebpf in previous mail but got no clear answer, like exposed too
much low level stuff to end user, that would be a problem because
script engine need to validate more memory references which should
not need to; and also there may be expose table lock into user inf future,
that would be more complex IMO.
The efforts need all related people participation, not only ktap,
ktap and bpf maybe need to close with each other, but not means
bpf stay and let ktap move close.
That's why I think the things would be more faster and effective if in
public place(like staging tree), otherwise the change may be hard to
happen.
Thanks.
Jovi
* Jovi Zhangwei <[email protected]> wrote:
> On Wed, Apr 2, 2014 at 3:43 PM, Ingo Molnar <[email protected]> wrote:
> >
> > * Jovi Zhangwei <[email protected]> wrote:
> >
> >> So based on all these input, I suggest:
> >>
> >> Put all these community efforts together, figure out the proper
> >> design implementation of dynamic tracing tool, ktap can be a good
> >> start to build upon it, evolve to a unified kernel script engine
> >> with ebpf together, finally service for dynamic tracing and
> >> network(if possible).
> >>
> >> Our goal is same and very clearly, we really want a "simple &
> >> flexible & safe" dynamic scripting tracing tool for Linux, which
> >> could compare or even better than Dtrace, this is the motivation of
> >> ktap project.
> >>
> >> Two solution may be take:
> >>
> >> 1). upstream ktap into core trace and evolve it step by step, and
> >> finally make a integrated bytecode engine, it's a long process,
> >> but I think it's worth.
> >>
> >> 2). move ktap back into staging, and graduate from staging after the
> >> code make tracing people and ebpf people both happy.
> >>
> >> The benefit is the process will be under the eyes of community.
> >>
> >> Ingo, steven, Greg, what do you think?
> >
> > For now I'm opting for a third option:
> >
> > 3) Maintain my NAK on the ktap patches until they address the
> > fundamental design concerns outlined by Alexei and others in
> > their review feedback:
> >
>
> There is no fundamental design concerns outlined about ktap, [...]
Well, the concerns that were outlined do raise to fundamental in my
book. But as long as they are addressed it does not matter how we
classify them.
> I don't see any suggestion about integrating BPF before this review
> cycle.
Hm, I mentioned it at the kernel summit to folks who raised the ktap
subject, but apparently not over email. I assumed a new ktap
submission would come quickly.
Anyway, considering how BPF is integrating with (and hopefully
replacing) tracing filters, it makes sense to apply the same concept
in the ktap case as well.
Thanks,
Ingo
On Wed, Apr 02, 2014 at 09:42:03AM +0200, Ingo Molnar wrote:
> I'd suggest using C syntax instead initially, because that's what the
> kernel is using.
>
> The overwhelming majority of people probing the kernel are
> programmers, so there's no point in inventing new syntax, we should
> reuse existing syntax!
Yes please, keep it C, I forever forget all other syntaxes. While I have
in the past known other languages, I never use them frequently enough to
remember them. And there's nothing more frustrating than having to fight
a tool/language when you just want to get work done.
On Fri, Apr 4, 2014 at 3:36 PM, Ingo Molnar <[email protected]> wrote:
>
> * Jovi Zhangwei <[email protected]> wrote:
>
>> I don't see any suggestion about integrating BPF before this review
>> cycle.
>
> Hm, I mentioned it at the kernel summit to folks who raised the ktap
> subject, but apparently not over email. I assumed a new ktap
> submission would come quickly.
>
> Anyway, considering how BPF is integrating with (and hopefully
> replacing) tracing filters, it makes sense to apply the same concept
> in the ktap case as well.
>
It seems that kernel developers want a C based tracing filter, BPF
rooted from C(Compiler, performance, and networking target),
but ktap designed from a different point of view, which highlights
simplicity and flexibility, this simplicity and flexibility is not only
presented by simple syntax, but also in its bytecode engine.
trace syscalls:* {
print(cpu, pid, execname, argstr, stack())
}
Each expression have own type in kernel, the type is not judged by
userspace compiler, this make print expression much much easy,
also make handle associate array(and aggregation) extremely simple.
If we force replace ktap core with a much "static" bytecode engine,
then it will lose the highlight feature: simplicity and flexibility.
ktap is not designed to be a built-in tracing filter and not purpose for
high performance bytecode engine(can use for networking), BPF is
suit for that way, but that is not a design flaw in ktap, it's designed
to be like that.
Obviously many people love ktap nowadays even though it's not a
C-family language, people can use ktap one-liner to do interesting
things(http://brendangregg.com/ktap.html), and used in real world.
Also especially people in embedded world like ktap very much,
they already included ktap into OpenEmebedded as I know.
It seems that more and more tracing related kernel modules will
come in future(like https://github.com/draios/sysdig/issues/81),
so maybe these third-part kernel modules based on tracing export
symbols should locate in drivers/, not kernel/trace/, today we have
well defined tracing kernel interface(tracepoint, kprobe, uprobe, perf
callback),
no reason to put all these external modules sit in kernel/trace one place,
especially if the module have real end users in world.
Thanks.
Jovi
(2014/04/07 22:55), Peter Zijlstra wrote:
> On Wed, Apr 02, 2014 at 09:42:03AM +0200, Ingo Molnar wrote:
>> I'd suggest using C syntax instead initially, because that's what the
>> kernel is using.
>>
>> The overwhelming majority of people probing the kernel are
>> programmers, so there's no point in inventing new syntax, we should
>> reuse existing syntax!
>
> Yes please, keep it C, I forever forget all other syntaxes. While I have
> in the past known other languages, I never use them frequently enough to
> remember them. And there's nothing more frustrating than having to fight
> a tool/language when you just want to get work done.
Why wouldn't you write a kernel module in C directly? :)
It seems that all what you need is not a tracing language nor a bytecode
engine, but an well organized tracing APIs(library?) for writing a kernel
module for tracing...
Thank you,
--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]
On Tue, Apr 08, 2014 at 04:40:36PM +0900, Masami Hiramatsu wrote:
> (2014/04/07 22:55), Peter Zijlstra wrote:
> > On Wed, Apr 02, 2014 at 09:42:03AM +0200, Ingo Molnar wrote:
> >> I'd suggest using C syntax instead initially, because that's what the
> >> kernel is using.
> >>
> >> The overwhelming majority of people probing the kernel are
> >> programmers, so there's no point in inventing new syntax, we should
> >> reuse existing syntax!
> >
> > Yes please, keep it C, I forever forget all other syntaxes. While I have
> > in the past known other languages, I never use them frequently enough to
> > remember them. And there's nothing more frustrating than having to fight
> > a tool/language when you just want to get work done.
>
> Why wouldn't you write a kernel module in C directly? :)
> It seems that all what you need is not a tracing language nor a bytecode
> engine, but an well organized tracing APIs(library?) for writing a kernel
> module for tracing...
Most my kernels are CONFIG_MODULE=n :-) Also, I never can remember how
to do modules.
That said; what I currently do it hack the kernel with debug bits and
pieces and run that, which is effectively the same. Its just that its
impossible to save/share these hacks in any sane fashion.
* Jovi Zhangwei <[email protected]> wrote:
> Obviously many people love ktap nowadays even though it's not a
> C-family language, [...]
Imagine how much more widespread it would become amongst kernel
developers if it had C syntax - see PeterZ's reply for example.
Thanks,
Ingo
On 04/14/2014 05:11 PM, Ingo Molnar wrote:
> * Jovi Zhangwei <[email protected]> wrote:
>
>> Obviously many people love ktap nowadays even though it's not a
>> C-family language, [...]
>
> Imagine how much more widespread it would become amongst kernel
> developers if it had C syntax - see PeterZ's reply for example.
+1
I think it would be awesome to reuse the kernel's BPF engine
with its C backend for this in the way that Alexei has started
in its original set; surely there's a lot of work that still
needs to be addressed, but it would allow long-term for just
having that engine to maintain, and also to exploit its
flexibility and speed (JIT) where we can.