Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753326AbbKZJlH (ORCPT ); Thu, 26 Nov 2015 04:41:07 -0500 Received: from mail-wm0-f46.google.com ([74.125.82.46]:38526 "EHLO mail-wm0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753134AbbKZJlC (ORCPT ); Thu, 26 Nov 2015 04:41:02 -0500 Date: Thu, 26 Nov 2015 10:40:57 +0100 From: Ingo Molnar To: "Wangnan (F)" Cc: Peter Zijlstra , Yunlong Song , paulus@samba.org, mingo@redhat.com, acme@kernel.org, linux-kernel@vger.kernel.org, namhyung@kernel.org, ast@kernel.org, masami.hiramatsu.pt@hitachi.com, kan.liang@intel.com, adrian.hunter@intel.com, jolsa@kernel.org, dsahern@gmail.com, bp@alien8.de, jean.pihet@linaro.org, rric@kernel.org, xiakaixu@huawei.com, hekuang@huawei.com Subject: Re: [PATCH] perf record: Add snapshot mode support for perf's regular events Message-ID: <20151126094057.GA7302@gmail.com> References: <1448373632-8806-1-git-send-email-yunlong.song@huawei.com> <20151125092728.GZ17308@twins.programming.kicks-ass.net> <565582E0.7070202@huawei.com> <20151125122038.GA17308@twins.programming.kicks-ass.net> <5655AF89.8070907@huawei.com> <20151126091910.GA6380@gmail.com> <20151126092738.GA6793@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151126092738.GA6793@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3488 Lines: 73 * Ingo Molnar wrote: > > * Ingo Molnar wrote: > > > > But yes, we can do that userspace ring buffer when we really need it. At > > > very first we can start working on perf side and assume overwrite mode is > > > ready. > > > > I don't think Peter asked for much: pick up the patch he has already written > > and use it, to have an even lower overhead always-enabled background tracing > > mode of perf. > > > > Resizing shouldn't be much of an issue with existing features: if events start > > overflowing or some other threshold for dynamic increase of the ring-buffer is > > met then the daemon should open a new set of events with a larger ring-buffer, > > and close the old events once the new tracing ring-buffer is up and running. > > > > Use event multiplexing to output all interesting events into the same single > > (per CPU) ring-buffer. > > Btw., there's another trick we could use to support ftrace-alike workflows even > better: we could expose a task's active perf ring-buffers under /proc// and > could make it readable. > > So if an overwrite-mode background tracing session is running, you don't even > have to signal it to capture the ring-buffer: just open the ring-buffer fd in > procfs, under /proc/XYZ/perf/ring-buffers/5.trace or so, and dump its current > contents, assuming the task doing that has sufficient permissions - i.e. > ptrace_may_access(). > > We could even pretty-print some very basic version of the records from the > kernel, via /proc/XYZ/perf/ring-buffers/5.txt, to support a tooling-less tracing > modes. This way perf based tracing could be supported even on systems that have > no writable filesystems. > > I.e. in this regard perf can be made to match ftrace's tracing workflow as well > - in addition to the more traditional perf profiling workflow we all love and > know! Also note that if we go in this direction then with some additional changes we could also support lightweight tracing with no tooling side at all on the traced system: a simple kernel feature with a kernel thread could be added that takes a list of events from sysfs or debugfs and opens them system-wide and exposes per-cpu overwrite mode ring-buffers. Those ring-buffers can then be accessed via procfs (and/or also be exposed in parallel via debugfs). The kernel thread never actually does anything except set up the events - i.e. this is a very lightweight mode of always-on tracing. Additional debugfs toggles can be added to temporarily turn tracing on/off without closing the events - just like ftrace. Other toggles could be added, such as: 'stop tracing when the kernel has crashed, or if a specific event has occured or a condition has been met'. That way we could, among other things, capture traces on embedded systems and copy the traces to another, larger system (or NFS-mount the target system), and run perf tooling to analyze the traces on that more powerful system. But it all starts with making overwrite mode work well, and working with the kernel visible ring-buffer. That can then be exposed to user-space in very expressive ways to turn perf into a flexible system tracing subsystem as well. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/