Message-ID: <52398FF1.5020502@redhat.com>
Date: Wed, 18 Sep 2013 13:35:13 +0200
From: Denys Vlasenko <dvlasenk@redhat.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
MIME-Version: 1.0
To: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: Tom Zanussi <tzanussi@gmail.com>, Steven Rostedt <srostedt@redhat.com>,
        Ingo Molnar <mingo@elte.hu>, Jiri Olsa <jolsa@redhat.com>,
        Masami Hiramatsu <mhiramat@redhat.com>,
        Oleg Nesterov <oleg@redhat.com>, linux-kernel@vger.kernel.org,
        Denys Vlasenko <vda.linux@googlemail.com>
Subject: Re: [RFC] Full syscall argument decode in "perf trace"
References: <20130917190606.GB3918@infradead.org>
In-Reply-To: <20130917190606.GB3918@infradead.org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2598
Lines: 68

On 09/17/2013 09:06 PM, Arnaldo Carvalho de Melo wrote:
> Em Tue, Sep 17, 2013 at 05:10:55PM +0200, Denys Vlasenko escreveu:
>> I'm trying to figure out how to extend "perf trace".
>  
>> Currently, it shows syscall names and arguments, and only them.
>> Meaning that syscalls such as open(2) are shown as:
>  
>>     open(filename: 140736118412184, flags: 0, mode: 140736118403776) = 3
>  
>> The problem is, of course, that user wants to see the filename
>> per se, not the address of its first byte.
>  
>> To improve that, we need to fetch the pointed-to data.
>> There are two approaches to this: extending
>> "raw_syscalls:sys_{enter,exit}" tracepoint so that it returns this data,
>> or selectively stopping the traced process when it reaches the thacepoint.
> 
> We don't want to stop the process at all, this is one of the major
> advantages of 'perf trace' over 'strace'.

This is a worthy goal. strace is so slow exactly because it stops
traced process so often. strace developers do want to avoid
as many of these stops as possible.

I'm not sure that "not stopping ever" is achievable, though.
There are cases where stopping is necessary.

For example, after clone() call, depending on the tracer needs,
there may be operations which must be done on the new child
before it is allowed to run.

strace used to use hideous, unsafe workarounds to catch children,
until ptrace was augmented with features which made children stop
immediately.

Do you think you can work around that? I just don't see how.

> Look at the tmp.perf/trace2 branch in my git repo, tglx and Ingo added a
> tracepoint to vfs_getname to use that.

I know that this is the way how to fetch syscall args without stopping,
yes.

The problem: ~100 more tracepoints need to be added merely to get
to the point where strace already is, wrt quality of syscall decoding.
strace has nearly 300 separate custom syscall formatting functions,
some of them quite complex.

If we need to add syscall stopping feature (which, as I said above,
will be necessary anyway IMO), then syscall decoding can be as good
as strace *already*. Then, gradually more tracepoints are added
to make it faster.

I am thinking about going into this direction.

Therefore my question should be restated as:

Would perf developers accept the "syscall pausing" feature,
or it won't be accepted?

-- 
vda

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/