by Bayduraev, Alexey V

[permalink] [raw]

Subject: Re: [PATCH v4 00/12] Introduce threaded trace streaming for basic perf record operation

On 09.04.2021 0:52, Jiri Olsa wrote:
> On Tue, Apr 06, 2021 at 11:37:26AM +0300, Bayduraev, Alexey V wrote:
>>
>> Changes in v4:
>> - renamed 'comm' structure to 'pipes'
>> - moved thread fd/maps messages to verbose=2
>> - fixed leaks during allocation of thread_data structures
>> - fixed leaks during allocation of thread masks
>> - fixed possible fails when releasing thread masks
>>
>> v3: https://lore.kernel.org/lkml/[email protected]/
>
> hi,
> I recall there was some issue wrt threading and intel_pt,
> which we either need to fixed or we need to disable threads
> for it
>
> [root@krava perf]# ./perf record -e intel_pt// --threads=cpu
> ^C[ perf record: Woken up 121 times to write data ]
> Warning:
> AUX data lost 95 times out of 206!
>
> [ perf record: Captured and wrote 211.364 MB perf.data ]
>
> [root@krava perf]# ./perf script
> Segmentation fault (core dumped)
>
> the fix should already be in the perf/record_threads branch,

Thanks,

As I can see, the fix from perf/record_threads is partially here,
except changes in util/auxtrace.c and setting one_mmap_addr/offset.
I will fix this.

I also try to refactor patches 11 and 12.

Regards,
Alexey

>
> jirka
>

2021-04-13 12:27:48

by Namhyung Kim

[permalink] [raw]

Subject: Re: [PATCH v4 08/12] perf record: introduce --threads=<spec> command line option

Hello,

On Tue, Apr 6, 2021 at 5:49 PM Bayduraev, Alexey V
<[email protected]> wrote:
>
>
> Provide --threads option in perf record command line interface.
> The option can have a value in the form of masks that specify
> cpus to be monitored with data streaming threads and its layout
> in system topology. The masks can be filtered using cpu mask
> provided via -C option.
>
> The specification value can be user defined list of masks. Masks
> separated by colon define cpus to be monitored by one thread and
> affinity mask of that thread is separated by slash. For example:
> <cpus mask 1>/<affinity mask 1>:<cpu mask 2>/<affinity mask 2>
> specifies parallel threads layout that consists of two threads
> with corresponding assigned cpus to be monitored.
>
> The specification value can be a string e.g. "cpu", "core" or
> "socket" meaning creation of data streaming thread for every
> cpu or core or socket to monitor distinct cpus or cpus grouped
> by core or socket.
>
> The option provided with no or empty value defaults to per-cpu
> parallel threads layout creating data streaming thread for every
> cpu being monitored.
>
> Feature design and implementation are based on prototypes [1], [2].
>
> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git -b perf/record_threads
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> Suggested-by: Jiri Olsa <[email protected]>
> Suggested-by: Namhyung Kim <[email protected]>
> Signed-off-by: Alexey Bayduraev <[email protected]>
> ---
[SNIP]
> +static int record__init_thread_masks_spec(struct record *rec, struct perf_cpu_map *cpus,
> + char **maps_spec, char **affinity_spec, u32 nr_spec)
> +{
> + u32 s;
> + int ret, nr_threads = 0;
> + struct mmap_cpu_mask cpus_mask;
> + struct thread_mask thread_mask, full_mask;
> +
> + ret = record__mmap_cpu_mask_alloc(&cpus_mask, cpu__max_cpu());
> + if (ret)
> + return ret;
> + record__mmap_cpu_mask_init(&cpus_mask, cpus);
> + ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu());
> + if (ret)
> + goto out_free_cpu_mask;
> + ret = record__thread_mask_alloc(&full_mask, cpu__max_cpu());
> + if (ret)
> + goto out_free_thread_mask;
> + record__thread_mask_clear(&full_mask);
> +
> + for (s = 0; s < nr_spec; s++) {
> + record__thread_mask_clear(&thread_mask);
> +
> + record__mmap_cpu_mask_init_spec(&thread_mask.maps, maps_spec[s]);
> + record__mmap_cpu_mask_init_spec(&thread_mask.affinity, affinity_spec[s]);
> +
> + if (!bitmap_and(thread_mask.maps.bits, thread_mask.maps.bits,
> + cpus_mask.bits, thread_mask.maps.nbits) ||
> + !bitmap_and(thread_mask.affinity.bits, thread_mask.affinity.bits,
> + cpus_mask.bits, thread_mask.affinity.nbits))
> + continue;
> +
> + ret = record__thread_mask_intersects(&thread_mask, &full_mask);
> + if (ret)
> + return ret;

I think you should free other masks.

> + record__thread_mask_or(&full_mask, &full_mask, &thread_mask);
> +
> + rec->thread_masks = realloc(rec->thread_masks,
> + (nr_threads + 1) * sizeof(struct thread_mask));
> + if (!rec->thread_masks) {
> + pr_err("Failed to allocate thread masks\n");
> + ret = -ENOMEM;
> + goto out_free_full_mask;

But this will leak rec->thread_masks as it's overwritten.

> + }
> + rec->thread_masks[nr_threads] = thread_mask;
> + pr_debug("thread_masks[%d]: addr=", nr_threads);
> + mmap_cpu_mask__scnprintf(&rec->thread_masks[nr_threads].maps, "maps");
> + pr_debug("thread_masks[%d]: addr=", nr_threads);
> + mmap_cpu_mask__scnprintf(&rec->thread_masks[nr_threads].affinity, "affinity");
> + nr_threads++;
> + ret = record__thread_mask_alloc(&thread_mask, cpu__max_cpu());
> + if (ret)
> + return ret;

Ditto, use goto.

> + }
> +
> + rec->nr_threads = nr_threads;
> + pr_debug("threads: nr_threads=%d\n", rec->nr_threads);
> +
> +out_free_full_mask:
> + record__thread_mask_free(&full_mask);
> +out_free_thread_mask:
> + record__thread_mask_free(&thread_mask);
> +out_free_cpu_mask:
> + record__mmap_cpu_mask_free(&cpus_mask);
> +
> + return 0;
> +}

[SNIP]
> +
> +static int record__init_thread_user_masks(struct record *rec, struct perf_cpu_map *cpus)
> +{
> + int t, ret;
> + u32 s, nr_spec = 0;
> + char **maps_spec = NULL, **affinity_spec = NULL;
> + char *spec, *spec_ptr, *user_spec, *mask, *mask_ptr;
> +
> + for (t = 0, user_spec = (char *)rec->opts.threads_user_spec; ; t++, user_spec = NULL) {
> + spec = strtok_r(user_spec, ":", &spec_ptr);
> + if (spec == NULL)
> + break;
> + pr_debug(" spec[%d]: %s\n", t, spec);
> + mask = strtok_r(spec, "/", &mask_ptr);
> + if (mask == NULL)
> + break;
> + pr_debug(" maps mask: %s\n", mask);
> + maps_spec = realloc(maps_spec, (nr_spec + 1) * sizeof(char *));
> + if (!maps_spec) {
> + pr_err("Failed to realloc maps_spec\n");
> + ret = -ENOMEM;
> + goto out_free_all_specs;

It'd crash as maps_spec is NULL now.

> + }
> + maps_spec[nr_spec] = strdup(mask);

You'd better check the return value.

> + mask = strtok_r(NULL, "/", &mask_ptr);
> + if (mask == NULL)
> + break;
> + pr_debug(" affinity mask: %s\n", mask);
> + affinity_spec = realloc(affinity_spec, (nr_spec + 1) * sizeof(char *));
> + if (!maps_spec) {

s/maps/affinity/ and it has the same problem.

> + pr_err("Failed to realloc affinity_spec\n");
> + ret = -ENOMEM;
> + goto out_free_all_specs;
> + }
> + affinity_spec[nr_spec] = strdup(mask);

Check the return value.

Thanks,
Namhyung

> + nr_spec++;
> + }
> +
> + ret = record__init_thread_masks_spec(rec, cpus, maps_spec, affinity_spec, nr_spec);
> +
> +out_free_all_specs:
> + for (s = 0; s < nr_spec; s++) {
> + free(maps_spec[s]);
> + free(affinity_spec[s]);
> + }
> + free(affinity_spec);
> + free(maps_spec);
> +
> + return ret;
> +}
> +