Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751588AbbFSGHu (ORCPT ); Fri, 19 Jun 2015 02:07:50 -0400 Received: from mail.bmw-carit.de ([62.245.222.98]:58348 "EHLO mail.bmw-carit.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751464AbbFSGHm (ORCPT ); Fri, 19 Jun 2015 02:07:42 -0400 X-CTCH-RefID: str=0001.0A0C0202.5583B1A9.01E3,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 Message-ID: <5583B1A9.60503@bmw-carit.de> Date: Fri, 19 Jun 2015 08:07:37 +0200 From: Daniel Wagner User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Alexei Starovoitov CC: Subject: Re: [PATCH v0] bpf: BPF based latency tracing References: <1434627604-9624-1-git-send-email-daniel.wagner@bmw-carit.de> <5582FA7A.4050505@plumgrid.com> In-Reply-To: <5582FA7A.4050505@plumgrid.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6196 Lines: 145 On 06/18/2015 07:06 PM, Alexei Starovoitov wrote: > On 6/18/15 4:40 AM, Daniel Wagner wrote: >> BPF offers another way to generate latency histograms. We attach >> kprobes at trace_preempt_off and trace_preempt_on and calculate the >> time it takes to from seeing the off/on transition. >> >> The first array is used to store the start time stamp. The key is the >> CPU id. The second array stores the log2(time diff). We need to use >> static allocation here (array and not hash tables). The kprobes >> hooking into trace_preempt_on|off should not calling any dynamic >> memory allocation or free path. We need to avoid recursivly >> getting called. Besides that, it reduces jitter in the measurement. >> >> CPU 0 >> latency : count distribution >> 1 -> 1 : 0 >> | | >> 2 -> 3 : 0 >> | | >> 4 -> 7 : 0 >> | | >> 8 -> 15 : 0 >> | | >> 16 -> 31 : 0 >> | | >> 32 -> 63 : 0 >> | | >> 64 -> 127 : 0 >> | | >> 128 -> 255 : 0 >> | | >> 256 -> 511 : 0 >> | | >> 512 -> 1023 : 0 >> | | >> 1024 -> 2047 : 0 >> | | >> 2048 -> 4095 : 166723 >> |*************************************** | >> 4096 -> 8191 : 19870 >> |*** | >> 8192 -> 16383 : 6324 >> | | >> 16384 -> 32767 : 1098 >> | | > > nice useful sample indeed! > The numbers are non-JITed, right? You are right. I forgot to enable the JIT compiler. > JIT should reduce the measurement cost 2-3x, but preempt_on/off > latency probably will stay in 2k range. Here is an example output with JIT enabled after a few seconds. latency : count distribution 1 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 0 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 0 | | 128 -> 255 : 0 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 11595 |*************************************** | 4096 -> 8191 : 3086 |********* | 8192 -> 16383 : 392 | | 16384 -> 32767 : 114 | | 32768 -> 65535 : 16 | | 65536 -> 131071 : 1 | | 131072 -> 262143 : 0 | | 262144 -> 524287 : 0 | | 524288 -> 1048575 : 34 | | As you predicted it wont go lower than 2k range. >> I am not sure if it is really worth spending more time getting >> the hash table working for the trace_preempt_[on|off] kprobes. >> There are so many things which could go wrong, so going with >> a static version seems for me the right choice. > > agree. for this use case arrays are better choice anyway. > But I'll keep working on getting hash tables working even > in this extreme conditions. bpf should be always rock solid. > > I'm only a bit suspicious of kprobes, since we have: > NOKPROBE_SYMBOL(preempt_count_sub) > but trace_preemp_on() called by preempt_count_sub() > don't have this mark... The original commit indicates that anything called from preempt_disable() should also be marked as NOKPROBE_SYMBOL: commit 43627582799db317e966ecb0002c2c3c9805ec0f Author: Srinivasa Ds Sun Feb 24 00:24:04 2008 Committer: Linus Torvalds Sun Feb 24 02:13:24 2008 Original File: kernel/sched.c kprobes: refuse kprobe insertion on add/sub_preempt_counter() Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these functions inturn call add/sub_preempt_count(). So we need to refuse user from inserting probe in to these functions. This patch disallows user from probing add/sub_preempt_count(). Obviously, this would render this patch useless. >> +SEC("kprobe/trace_preempt_off") BTW, is there a reason why not supporting build-in tracepoints/events? It looks like it is only an artificial limitation of bpf_helpers. >> +int bpf_prog1(struct pt_regs *ctx) >> +{ >> + int cpu = bpf_get_smp_processor_id(); >> + u64 *ts = bpf_map_lookup_elem(&my_map, &cpu); >> + >> + if (ts) >> + *ts = bpf_ktime_get_ns(); > > btw, I'm planning to add native per-cpu maps which will > speed up things more and reduce measurement overhead. Funny I was about to suggest something like this :) > I think you can retarget this patch to net-next and send > it to netdev. It's not too late for this merge window. I'll rebase it to net-next. cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/