LinuxLists.cc - Re: [RFC PATCH 5/7] perf tools: Support setting different slots in a BPF map separately

2015-11-20 13:29:10

Subject: Re: [RFC PATCH 5/7] perf tools: Support setting different slots in a BPF map separately

On 2015/10/17 18:48, Wang Nan wrote:
> This patch introduces basic facilities to support config different
> slots in a BPF map one by one.
>
> nr_indics and indics are introduced into 'struct parse_events_term',
> where indics is an array of indics which will be configured by this
> config term, nr_indics is the size of the array. The array is passed
> to 'struct bpf_map_priv'. To indicate the new type of configuration,
> BPF_MAP_PRIV_KEY_INDICS is added as a new key type.
> bpf_map_config_foreach_key() is extended to iterate over those indics
> instead of all possible keys.
>
> Signed-off-by: Wang Nan <[email protected]>
> Cc: Arnaldo Carvalho de Melo <[email protected]>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Brendan Gregg <[email protected]>
> Cc: Daniel Borkmann <[email protected]>
> Cc: David Ahern <[email protected]>
> Cc: He Kuang <[email protected]>
> Cc: Jiri Olsa <[email protected]>
> Cc: Kaixu Xia <[email protected]>
> Cc: Masami Hiramatsu <[email protected]>
> Cc: Namhyung Kim <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Zefan Li <[email protected]>
> Cc: [email protected]
> ---
> tools/perf/util/bpf-loader.c | 68 +++++++++++++++++++++++++++++++++++++++++-
> tools/perf/util/parse-events.c | 4 ++-
> tools/perf/util/parse-events.h | 2 ++
> 3 files changed, 72 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index 15cf27a..023fc12 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -638,6 +638,7 @@ int bpf__foreach_tev(struct bpf_object *obj,
>
> enum bpf_map_priv_key_type {
> BPF_MAP_PRIV_KEY_ALL,
> + BPF_MAP_PRIV_KEY_INDICS,
> };
>
> enum bpf_map_priv_value_type {
> @@ -647,6 +648,12 @@ enum bpf_map_priv_value_type {
> struct bpf_map_priv {
> struct {
> enum bpf_map_priv_key_type type;
> + union {
> + struct {
> + size_t nr_indics;
> + u64 *indics;
> + } indics;
> + };
> } key;
>
> struct {
> @@ -663,6 +670,8 @@ bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
> {
> struct bpf_map_priv *priv = _priv;
>
> + if (priv->key.type == BPF_MAP_PRIV_KEY_INDICS)
> + zfree(&priv->key.indics.indics);
> free(priv);
> }
>
> @@ -718,6 +727,20 @@ bpf_map_config_foreach_key(struct bpf_map *map,
> }
> }
> return 0;
> + case BPF_MAP_PRIV_KEY_INDICS:
> + for (i = 0; i < priv->key.indics.nr_indics; i++) {
> + u64 _idx = priv->key.indics.indics[i];
> + unsigned int idx = (unsigned int)(_idx);
> +
> + err = (*func)(name, map_fd, &def,
> + priv, &idx, arg);
> + if (err) {
> + pr_debug("ERROR: failed to insert value to %s[%u]\n",
> + name, idx);
> + return err;
> + }
> + }

This for-loop has a potential problem that, if perf's user want to
set a very big array using indices, for example:

# perf record -e
mybpf.c/maps:mymap:values[1,2,3,10-100000,200000-400000]=3/
mybpf.c/maps:mymap:values[100000-200000]=3/ ...

Perf would alloc nearly 300000 slots for indices array, consume too much
memory.

I will fix this problem by reinterprete indices array, makes negative
value represent range start and use next slot to store range size. For
example, the above perf cmdline can be converted to:

{1,2,3,-10, 99991,-200000,200001} and {-100000,100001}.

Thank you.

2015-11-20 15:36:13

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [RFC PATCH 5/7] perf tools: Support setting different slots in a BPF map separately

Em Fri, Nov 20, 2015 at 09:25:36PM +0800, Wangnan (F) escreveu:
> >+ case BPF_MAP_PRIV_KEY_INDICS:
> >+ for (i = 0; i < priv->key.indics.nr_indics; i++) {
> >+ u64 _idx = priv->key.indics.indics[i];
> >+ unsigned int idx = (unsigned int)(_idx);
> >+
> >+ err = (*func)(name, map_fd, &def,
> >+ priv, &idx, arg);
> >+ if (err) {
> >+ pr_debug("ERROR: failed to insert value to %s[%u]\n",
> >+ name, idx);
> >+ return err;
> >+ }
> >+ }
>
> This for-loop has a potential problem that, if perf's user want to
> set a very big array using indices, for example:
>
> # perf record -e
> mybpf.c/maps:mymap:values[1,2,3,10-100000,200000-400000]=3/
> mybpf.c/maps:mymap:values[100000-200000]=3/ ...
>
> Perf would alloc nearly 300000 slots for indices array, consume too much
> memory.
>
> I will fix this problem by reinterprete indices array, makes negative
> value represent range start and use next slot to store range size. For
> example, the above perf cmdline can be converted to:
>
> {1,2,3,-10, 99991,-200000,200001} and {-100000,100001}.

Why is that changing the way you specify what entries should be set to
a value will make it not allocate too much memory?

I found the first form of representing ( start-end ) to be better than (
-start, size ), but I would use what the C language uses for expressing
ranges in switch case ranges, which is familiar and doesn't reuses the
minus arithmetic operator to express a range, i.e.:

# perf record -e \
mybpf.c/maps:mymap:values[1,2,3,10..100000,200000..400000]=3/

# perf record -e \
mybpf.c/maps:mymap:values[100000..200000]=3/ ...

- Arnaldo

2015-11-23 02:03:11

by Wang Nan

[permalink] [raw]

Subject: Re: [RFC PATCH 5/7] perf tools: Support setting different slots in a BPF map separately

On 2015/11/20 23:34, Arnaldo Carvalho de Melo wrote:
> Em Fri, Nov 20, 2015 at 09:25:36PM +0800, Wangnan (F) escreveu:
>>> + case BPF_MAP_PRIV_KEY_INDICS:
>>> + for (i = 0; i < priv->key.indics.nr_indics; i++) {
>>> + u64 _idx = priv->key.indics.indics[i];
>>> + unsigned int idx = (unsigned int)(_idx);
>>> +
>>> + err = (*func)(name, map_fd, &def,
>>> + priv, &idx, arg);
>>> + if (err) {
>>> + pr_debug("ERROR: failed to insert value to %s[%u]\n",
>>> + name, idx);
>>> + return err;
>>> + }
>>> + }
>> This for-loop has a potential problem that, if perf's user want to
>> set a very big array using indices, for example:
>>
>> # perf record -e
>> mybpf.c/maps:mymap:values[1,2,3,10-100000,200000-400000]=3/
>> mybpf.c/maps:mymap:values[100000-200000]=3/ ...
>>
>> Perf would alloc nearly 300000 slots for indices array, consume too much
>> memory.
>>
>> I will fix this problem by reinterprete indices array, makes negative
>> value represent range start and use next slot to store range size. For
>> example, the above perf cmdline can be converted to:
>>
>> {1,2,3,-10, 99991,-200000,200001} and {-100000,100001}.
> Why is that changing the way you specify what entries should be set to
> a value will make it not allocate too much memory?

It is actually a problem in the next patch, in which it expand all range
into a series of indices. If user wants 1-10000, it creates an array as
[1,2,3,4,...10000], so user is possible to use a simple cmdline to consume
all of available memory.

However, the method I described above is not the best way to solve this
probelm.
I thought yesterday that we should not insist on indices array. We can
make parser always return ranges. For example, [1,2,3-5] can be represent
using [(1,1), (2,1), (3,3)], so we don't need the above ugly negative
indicators.

> I found the first form of representing ( start-end ) to be better than (
> -start, size ), but I would use what the C language uses for expressing
> ranges in switch case ranges, which is familiar and doesn't reuses the
> minus arithmetic operator to express a range, i.e.:
>
> # perf record -e \
> mybpf.c/maps:mymap:values[1,2,3,10..100000,200000..400000]=3/
>
> # perf record -e \
> mybpf.c/maps:mymap:values[100000..200000]=3/ ...

'..' is better.

Thank you.

> - Arnaldo

2015-11-23 05:46:38

by Wang Nan

[permalink] [raw]

Subject: Re: [RFC PATCH 5/7] perf tools: Support setting different slots in a BPF map separately

On 2015/11/23 10:01, Wangnan (F) wrote:
>
>
> On 2015/11/20 23:34, Arnaldo Carvalho de Melo wrote:
>> Em Fri, Nov 20, 2015 at 09:25:36PM +0800, Wangnan (F) escreveu:
>>>> + case BPF_MAP_PRIV_KEY_INDICS:
>>>> + for (i = 0; i < priv->key.indics.nr_indics; i++) {
>>>> + u64 _idx = priv->key.indics.indics[i];
>>>> + unsigned int idx = (unsigned int)(_idx);
>>>> +
>>>> + err = (*func)(name, map_fd, &def,
>>>> + priv, &idx, arg);
>>>> + if (err) {
>>>> + pr_debug("ERROR: failed to insert value to
>>>> %s[%u]\n",
>>>> + name, idx);
>>>> + return err;
>>>> + }
>>>> + }
>>> This for-loop has a potential problem that, if perf's user want to
>>> set a very big array using indices, for example:
>>>
>>> # perf record -e
>>> mybpf.c/maps:mymap:values[1,2,3,10-100000,200000-400000]=3/
>>> mybpf.c/maps:mymap:values[100000-200000]=3/ ...
>>>
>>> Perf would alloc nearly 300000 slots for indices array, consume too
>>> much
>>> memory.
>>>
>>> I will fix this problem by reinterprete indices array, makes negative
>>> value represent range start and use next slot to store range size. For
>>> example, the above perf cmdline can be converted to:
>>>
>>> {1,2,3,-10, 99991,-200000,200001} and {-100000,100001}.
>> Why is that changing the way you specify what entries should be set to
>> a value will make it not allocate too much memory?
>
> It is actually a problem in the next patch, in which it expand all range
> into a series of indices. If user wants 1-10000, it creates an array as
> [1,2,3,4,...10000], so user is possible to use a simple cmdline to
> consume
> all of available memory.
>
> However, the method I described above is not the best way to solve
> this probelm.
> I thought yesterday that we should not insist on indices array. We can
> make parser always return ranges. For example, [1,2,3-5] can be represent
> using [(1,1), (2,1), (3,3)], so we don't need the above ugly negative
> indicators.
>
>> I found the first form of representing ( start-end ) to be better than (
>> -start, size ), but I would use what the C language uses for expressing
>> ranges in switch case ranges, which is familiar and doesn't reuses the
>> minus arithmetic operator to express a range, i.e.:
>>
>> # perf record -e \
>> mybpf.c/maps:mymap:values[1,2,3,10..100000,200000..400000]=3/
>>
>> # perf record -e \
>> mybpf.c/maps:mymap:values[100000..200000]=3/ ...
>
> '..' is better.
>

One problem: the case range syntax is introduced by gcc extension, not a
part of standard, and should be '...'.

Please see: https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html

So I'll use '...' also.

Thank you.

> Thank you.
>
>> - Arnaldo
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/