Received: by 10.223.185.116 with SMTP id b49csp3151978wrg; Mon, 12 Feb 2018 21:56:01 -0800 (PST) X-Google-Smtp-Source: AH8x2275q5+zsp15HhS2YCWHI46TDiKYhAwdnVUaUV36UTkCnTwSL6lZ/9kZB3bnRGAGMOSaLbAP X-Received: by 10.98.138.129 with SMTP id o1mr119603pfk.182.1518501360906; Mon, 12 Feb 2018 21:56:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518501360; cv=none; d=google.com; s=arc-20160816; b=Bp0E/VgkkOtQUMzQHrP35nliEZX0R6BNL9LLBU3RNP+xZeqEhxvzrN/hPDgdxBvguJ pejmLiTkNG3r32W7zWKLywJyybmI6TBA3iImQ/bqwKUuD4pyoDJABmVHJOhlrU/8Vk1Q EVS1jure2C9Pm90hwyjajgfsNTBY+Sx9Rn+2SQDimVHXl92U0PrXNvPTStbk6fYzIp15 jt616Ret+lFoX75NWnnqKDawt5P8rjamQbE955trevemi7NQEGr6imATCIIVExzrKviL z75YITZbrdcR4qCDcz8Mbv6aabP3HCYF3YbNQDDe9U1raOTbquGN1uo2f/hCtQvfD7m4 uktw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=53w4NtKOeyX6Bad5ns7zXkYhMOIpxG94jeGf7s7g2Mk=; b=yPfW0A70BCJHsBOA1n5QBMBZmKpbnjhNFr1bppEb+x056vzsYB3z5bBzO4jVOVUud+ w4Lqe4FZdIRDVT1P3P51SvsJnttLBSZLDC3fINZ9KtLI7pgKdYKF7NkTG1n4b6GF+s9p /8bQV1htqVVAFmvF+6E/tQOwwOPskstihG3dX5TPhrl0/X4DYNKRobmNAa/4ldrIGoiM tHWPB3hTRznZc62MQGV+IsOUynMyXNzMqBwwB2r/+e+y6Z9Lrma2O3JOx8VWPpjAQ9UA XSatK76bjOqycb4Xb+8pVwHDtM5VE+IhTuITqmUaV2HiZ7iT/QQxKng7OfQNUQ/8r3hs foyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kent-edu.20150623.gappssmtp.com header.s=20150623 header.b=bJ7kaQST; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n78si943012pfj.337.2018.02.12.21.55.46; Mon, 12 Feb 2018 21:56:00 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kent-edu.20150623.gappssmtp.com header.s=20150623 header.b=bJ7kaQST; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751830AbeBMFyn (ORCPT + 99 others); Tue, 13 Feb 2018 00:54:43 -0500 Received: from mail-ot0-f196.google.com ([74.125.82.196]:45225 "EHLO mail-ot0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751420AbeBMFyl (ORCPT ); Tue, 13 Feb 2018 00:54:41 -0500 Received: by mail-ot0-f196.google.com with SMTP id 73so16222553oti.12 for ; Mon, 12 Feb 2018 21:54:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kent-edu.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=53w4NtKOeyX6Bad5ns7zXkYhMOIpxG94jeGf7s7g2Mk=; b=bJ7kaQSTcjtVOtf5QvfoRffJS4ottBvMz8NR1cGFZRck6K+LONzgnKG7SpUlXTevYs EYkpFoJGERyLaOXvVTRh5owkC+qeWczq8cFF/TEMmLnpilLtFWAmM43KovbR9dSXL2NX 6btxgOM17P5GQvIioGWS+o4in1pOvgLoDmOgn0e/vBeZ+f4n7gnX5Zbxsyo/uCppHOaE zxzbszHKxNwVwQf4Mwmj6fI2g8H32onTKcu9BUz3vWaJTiw0VNWYX5//09MqVfmxU0sw aD08GgQ8JJnL2QZ3hVdtkTy/aRRkzsxmHhmVqjtlqBDZfzjZJJ5TpqNR3/fxue2u1h7p H3tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=53w4NtKOeyX6Bad5ns7zXkYhMOIpxG94jeGf7s7g2Mk=; b=OhaAauSKlTXonH7qkT0VWA6YbxWF1dE1hn95dckFfiFDo7FBJxkJVK5tKjALmepP4v P3splvurzr/2N6851yHSs3vX4FibEKMIsA30gs6DbdeVGKy3Dhpy1jWn/9WxU3vOS+7i VkB0ZdKPW9ZwpYxi002Iob0vU6ZLyGVhA1hyw0B/nca5YU729j5mqbpQJCFDBK2kuHfo 2o7l5gHDsm9BV1JA5UWqN72DGS/igeMB25PZT398G6KSH1MI1F4qXZBMbCUwaLAz7Dmp 3FarH/kEsghAgwCOI/TgzU6HSc0R+5Y9edw7RAZ6ypJUq1T+FN0aGQvnPfDMrJbqOevD o/rA== X-Gm-Message-State: APf1xPCiwH824Irry5n7pohFaLIMbAjMkmnZu7BliFgtUkFYzDBWip9v NTU/A4F6hR8bUcQZGWRKP8NyvSz3TFEox5Q6xxliCw== X-Received: by 10.157.14.210 with SMTP id 76mr52659otj.263.1518501280627; Mon, 12 Feb 2018 21:54:40 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.38.103 with HTTP; Mon, 12 Feb 2018 21:54:40 -0800 (PST) In-Reply-To: <20180213012903.ef6b0efe5b896994ef8fc713@kernel.org> References: <20180213012903.ef6b0efe5b896994ef8fc713@kernel.org> From: "Md. Islam" Date: Tue, 13 Feb 2018 00:54:40 -0500 Message-ID: Subject: Re: [PATCH net-next] trace_events_filter: conditional trace event (tcp_probe full=0) To: Masami Hiramatsu Cc: tom.zanussi@linux.intel.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, stephen@networkplumber.org, mathieu.desnoyers@polymtl.ca, Steven Rostedt Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 12, 2018 at 11:29 AM, Masami Hiramatsu wrote: > On Mon, 12 Feb 2018 00:08:46 -0500 > "Md. Islam" wrote: > >> Recently tcp_probe kernel module has been replaced by trace_event. Old >> tcp_probe had full=0 option where it only takes a snapshot only when >> congestion window is changed. However I did not find such >> functionality in trace_event. > > Yes, that seems broken to me. You should filter by using perf script or > bpf. I'm not so clear about network stack, but it seems that cwnd can be > set for each tcp connection. This means "current snd_cwnd" must be stored > for each connection. > >> This is why I implemented this >> "conditional trace_event" where a snapshot is taken only if some field >> is changed. For instance, following filter will record a snapshot only >> if snd_cwnd field is changed from the previous snapshot: >> >> cd /sys/kernel/debug/tracing >> echo 1 > events/tcp/tcp_probe/enable >> echo "(sport == 8554 && snd_cwnd =< 0)" > events/tcp/tcp_probe/filter & >> cat trace_pipe > /home/tamim/net-next/trace.data > > Sounds interesting, but it seems like a hack. > Filter pred is stored for each trace event (tracepoint) so it is shared > among several context. At least you need a lock to exclude each thread > to update it. And I don't recommend to update filter_pred while tracing, > since it means we have a kind of "hidden state" in the filer. And again, > that filter_pred is shared, other tcp connection can see it. > (sport can be same as other connection's sport from other IP) > > So IMHO, this kind of hack we don't implement in the kernel. Instead, > you can use perf-script and program it by python or perl. It gives you > much safer way to filter it out. > > Or, please try to make it as complete feature. For example, maybe we can > introduce a global 'named' variable (which has correct accessor with spinlock > or atomic update), and you can refer it with name. > Then, it becomes safe and more generic, like below; > > $ cd /sys/kernel/debug/tracing > $ touch vars/cwnd > $ echo '(sport == 12345 && snd_cwnd != $cwnd)' > events/tcp/tcp_probe/filter > $ echo 'update_var:cwnd:snd_cwnd if traced' > events/tcp/tcp_probe/trigger > > Thank you, Hi Masami Thanks for the suggestions! It does makes sense. I created a global atomic variable in predicate. Then I'm updating that based on the filter. Now it can be done as below: cd /sys/kernel/debug/tracing && echo 1 > events/tcp/tcp_probe/enable && echo "(sport == 8554 && snd_cwnd != \$cwnd)" > events/tcp/tcp_probe/filter The filter identify the global variable by the presence of $. However needed to pass '\$' to catch $ in kernel. The patch is : diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h index 2a6d032..15a83c7 100644 --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1402,6 +1402,7 @@ struct regex { struct filter_pred { filter_pred_fn_t fn; u64 val; + atomic64_t global_var; struct regex regex; unsigned short *ops; struct ftrace_event_field *field; @@ -1427,6 +1428,16 @@ static inline bool is_function_field(struct ftrace_event_field *field) return field->filter_type == FILTER_TRACE_FN; } +/// The string prerdicate is a global variable if it starts with $. +/// \param field predicate +/// \return +static inline bool is_global_variable(char *field) +{ + if(strlen(field) == 0) + return false; + return field[0] == '$'; +} + extern enum regex_type filter_parse_regex(char *buff, int len, char **search, int *not); extern void print_event_filter(struct trace_event_file *file, diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c index 61e7f06..8bf43353 100644 --- a/kernel/trace/trace_events_filter.c +++ b/kernel/trace/trace_events_filter.c @@ -202,6 +202,15 @@ static int filter_pred_##size(struct filter_pred *pred, void *event) \ return match; \ } +#define DEFINE_GLOBAL_PRED(size) \ +static int filter_global_pred_##size(struct filter_pred *pred, void *event) \ +{ \ + u##size *addr = (u##size *)(event + pred->offset); \ + u##size old_global_var = (u##size)atomic64_xchg(&pred->global_var, (u64)*addr); \ + return (old_global_var == *addr) ^ pred->not; \ +} + + DEFINE_COMPARISON_PRED(s64); DEFINE_COMPARISON_PRED(u64); DEFINE_COMPARISON_PRED(s32); @@ -216,6 +225,11 @@ DEFINE_EQUALITY_PRED(32); DEFINE_EQUALITY_PRED(16); DEFINE_EQUALITY_PRED(8); +DEFINE_GLOBAL_PRED(64); +DEFINE_GLOBAL_PRED(32); +DEFINE_GLOBAL_PRED(16); +DEFINE_GLOBAL_PRED(8); + /* Filter predicate for fixed sized arrays of characters */ static int filter_pred_string(struct filter_pred *pred, void *event) { @@ -988,6 +1002,29 @@ static bool is_legal_op(struct ftrace_event_field *field, enum filter_op_ids op) return true; } +static filter_pred_fn_t select_global_fn(enum filter_op_ids op, + int field_size, int field_is_signed) +{ + filter_pred_fn_t fn = NULL; + + switch (field_size) { + case 8: + fn = filter_global_pred_64; + break; + case 4: + fn = filter_global_pred_32; + break; + case 2: + fn = filter_global_pred_16; + break; + case 1: + fn = filter_global_pred_8; + break; + } + + return fn; +} + static filter_pred_fn_t select_comparison_fn(enum filter_op_ids op, int field_size, int field_is_signed) { @@ -1066,7 +1103,15 @@ static int init_pred(struct filter_parse_state *ps, parse_error(ps, FILT_ERR_IP_FIELD_ONLY, 0); return -EINVAL; } - } else { + } else if(is_global_variable(pred->regex.pattern)){ + atomic64_set(&pred->global_var, 0); + fn = select_global_fn(pred->op, field->size, + field->is_signed); + if (!fn) { + parse_error(ps, FILT_ERR_INVALID_OP, 0); + return -EINVAL; + } + }else { if (field->is_signed) ret = kstrtoll(pred->regex.pattern, 0, &val); else I didn't however updating it in predicate yet. I'm updating the global variable using atomic64_xchg() in the filter. Do you still prefer to implement it in trigger? Please let me know any suggestion regrading the patch. Many thanks Tamim > >> >> This will record a snapshot where source port is 8554 and snd_cwnd is >> not same as the snd_cwnd in previous snapshot. For lack of better >> operator, I used "=<" to represent the field on which the filter will >> be applied. >> >> The way it works is, it updates the predicate with new value every >> time. So the next time, if the field is same as in the predicate, the >> snapshot is ignored. Initially it checks if the field is 0 or not. >> >> diff --git a/kernel/trace/trace_events_filter.c >> b/kernel/trace/trace_events_filter.c >> index 61e7f06..8d733fa 100644 >> --- a/kernel/trace/trace_events_filter.c >> +++ b/kernel/trace/trace_events_filter.c >> @@ -39,6 +39,7 @@ enum filter_op_ids >> OP_AND, >> OP_GLOB, >> OP_NE, >> + OP_NE_PREV, >> OP_EQ, >> OP_LT, >> OP_LE, >> @@ -62,6 +63,7 @@ static struct filter_op filter_ops[] = { >> { OP_AND, "&&", 2 }, >> { OP_GLOB, "~", 4 }, >> { OP_NE, "!=", 4 }, >> + { OP_NE_PREV, "=<", 4 }, >> { OP_EQ, "==", 4 }, >> { OP_LT, "<", 5 }, >> { OP_LE, "<=", 5 }, >> @@ -202,6 +204,21 @@ static int filter_pred_##size(struct filter_pred >> *pred, void *event) \ >> return match; \ >> } >> >> +#define DEFINE_NE_PREV_PRED(size) \ >> +static int filter_ne_prev_pred_##size(struct filter_pred *pred, void >> *event) \ >> +{ \ >> + u##size *addr = (u##size *)(event + pred->offset); \ >> + u##size val = (u##size)pred->val; \ >> + int match; \ >> + \ >> + match = (val == *addr) ^ pred->not; \ >> + \ >> + if(match == 1) \ >> + pred->val = *addr; \ >> + return match; \ >> +} >> + >> + >> DEFINE_COMPARISON_PRED(s64); >> DEFINE_COMPARISON_PRED(u64); >> DEFINE_COMPARISON_PRED(s32); >> @@ -216,6 +233,11 @@ DEFINE_EQUALITY_PRED(32); >> DEFINE_EQUALITY_PRED(16); >> DEFINE_EQUALITY_PRED(8); >> >> +DEFINE_NE_PREV_PRED(64); >> +DEFINE_NE_PREV_PRED(32); >> +DEFINE_NE_PREV_PRED(16); >> +DEFINE_NE_PREV_PRED(8); >> + >> /* Filter predicate for fixed sized arrays of characters */ >> static int filter_pred_string(struct filter_pred *pred, void *event) >> { >> @@ -980,7 +1002,7 @@ int filter_assign_type(const char *type) >> static bool is_legal_op(struct ftrace_event_field *field, enum >> filter_op_ids op) >> { >> if (is_string_field(field) && >> - (op != OP_EQ && op != OP_NE && op != OP_GLOB)) >> + (op != OP_EQ && op != OP_NE && op != OP_GLOB && op != OP_NE_PREV)) >> return false; >> if (!is_string_field(field) && op == OP_GLOB) >> return false; >> @@ -997,6 +1019,8 @@ static filter_pred_fn_t select_comparison_fn(enum >> filter_op_ids op, >> case 8: >> if (op == OP_EQ || op == OP_NE) >> fn = filter_pred_64; >> + else if (op == OP_NE_PREV) >> + fn = filter_ne_prev_pred_64; >> else if (field_is_signed) >> fn = pred_funcs_s64[op - PRED_FUNC_START]; >> else >> @@ -1005,6 +1029,8 @@ static filter_pred_fn_t >> select_comparison_fn(enum filter_op_ids op, >> case 4: >> if (op == OP_EQ || op == OP_NE) >> fn = filter_pred_32; >> + else if (op == OP_NE_PREV) >> + fn = filter_ne_prev_pred_32; >> else if (field_is_signed) >> fn = pred_funcs_s32[op - PRED_FUNC_START]; >> else >> @@ -1013,6 +1039,8 @@ static filter_pred_fn_t >> select_comparison_fn(enum filter_op_ids op, >> case 2: >> if (op == OP_EQ || op == OP_NE) >> fn = filter_pred_16; >> + else if (op == OP_NE_PREV) >> + fn = filter_ne_prev_pred_16; >> else if (field_is_signed) >> fn = pred_funcs_s16[op - PRED_FUNC_START]; >> else >> @@ -1021,6 +1049,8 @@ static filter_pred_fn_t >> select_comparison_fn(enum filter_op_ids op, >> case 1: >> if (op == OP_EQ || op == OP_NE) >> fn = filter_pred_8; >> + else if (op == OP_NE_PREV) >> + fn = filter_ne_prev_pred_8; >> else if (field_is_signed) >> fn = pred_funcs_s8[op - PRED_FUNC_START]; >> else >> @@ -1088,7 +1118,7 @@ static int init_pred(struct filter_parse_state *ps, >> } >> } >> >> - if (pred->op == OP_NE) >> + if (pred->op == OP_NE || pred->op == OP_NE_PREV) >> pred->not ^= 1; >> >> pred->fn = fn; >> @@ -2197,7 +2227,7 @@ static int ftrace_function_check_pred(struct >> filter_pred *pred, int leaf) >> * - only '==' and '!=' is used >> * - the 'ip' field is used >> */ >> - if ((pred->op != OP_EQ) && (pred->op != OP_NE)) >> + if ((pred->op != OP_EQ) && (pred->op != OP_NE) && (pred->op >> != OP_NE_PREV)) >> return -EINVAL; >> >> if (strcmp(field->name, "ip")) >> >> >> I'm new to upstream kernel development. Please let me any suggestion. >> >> Many thanks >> Tamim >> PhD Candidate, >> Kent State University > > > -- > Masami Hiramatsu -- Tamim PhD Candidate, Kent State University http://web.cs.kent.edu/~mislam4/