Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp635779img; Mon, 18 Mar 2019 10:45:47 -0700 (PDT) X-Google-Smtp-Source: APXvYqx/lbgeFxXQZ95Ezg5ABLw3mhwjkJuN1kport+MZPVYLfHjWpOEHZAWpDTxo5x7PQRYwKF1 X-Received: by 2002:aa7:8c97:: with SMTP id p23mr20657013pfd.229.1552931146970; Mon, 18 Mar 2019 10:45:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552931146; cv=none; d=google.com; s=arc-20160816; b=ALkFi/AKAeHw+x4PobDvK2hfkfKd4PbFYvXpReWMAQVteJoukI9MbHmkL634H8jK6b LKLW9G4ubYzkupD8M/J8y4r/aCZ15Nzy1sphhXw/aDkxzTCOQf84EjBSTEZwPhJ21vhl iyN7HJZ9c8OABMMq+2XUcIaYTNE1J9AxfRrPuwjFyUouSkyqYaHh0OxKru8KpyCL6y9A oZk/sJD8fqiKIF2FV33Gap5B1Q8iWALvCvyvxDiUdvJYT+Mr5M9WtfyTP1hY6KhcxiNt dxDJ454GGKf2mVNenM2C30jkEa3Xb5dKgjSRNb/yOLNaSc7W2w6U9fP28bR06g6szFqE Lg2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:references:cc:to:from:subject; bh=mHGTg+NnHV1ScmO/sn5yekPSgMkHCxEezvb7E4pBBd0=; b=CsJhsHmr+WlyOaOfAqNGsz/5A/6TlZBsoD8ru0VGLicOpYOQ+kW0nYR7TsNvOOWRJF Ja/2SK6Kl2+h3wjm4yBFcElDSfVYmSOM30oKadtX+KAoDznhz0/mMQJAPizRYLeabWGx HQZaFoYrGjQgVAGMw60Hr2RDHFP52Q+5e/4fdmTAjUlh1gqI47ijyYSZIenRmUZCI/gF bFSLLea6OoxInQo5DTkvipFrDwaVy0lVSbvz6ZTCRVAFvJStghQ7lu/I13vRqjkzbsQw 10L+OGY1fcLDmhCuzPCiQO7JpO/kYmVNsN1VcpDERylULtyv4jbZzrYM19aESH0yNds9 HRww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h2si9780789pfn.77.2019.03.18.10.45.32; Mon, 18 Mar 2019 10:45:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728127AbfCRRoT (ORCPT + 99 others); Mon, 18 Mar 2019 13:44:19 -0400 Received: from mga18.intel.com ([134.134.136.126]:31893 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728118AbfCRRoQ (ORCPT ); Mon, 18 Mar 2019 13:44:16 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Mar 2019 10:44:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,494,1544515200"; d="scan'208";a="215245733" Received: from linux.intel.com ([10.54.29.200]) by orsmga001.jf.intel.com with ESMTP; 18 Mar 2019 10:44:16 -0700 Received: from [10.251.91.19] (abudanko-mobl.ccr.corp.intel.com [10.251.91.19]) by linux.intel.com (Postfix) with ESMTP id 9FC5F580238; Mon, 18 Mar 2019 10:44:13 -0700 (PDT) Subject: [PATCH v10 08/12] perf record: implement compression for AIO trace streaming From: Alexey Budankov To: Arnaldo Carvalho de Melo Cc: Jiri Olsa , Namhyung Kim , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , Andi Kleen , linux-kernel References: <12cce142-6238-475b-b9aa-236531c12c2b@linux.intel.com> Organization: Intel Corp. Message-ID: <77db2b2c-5d03-dbb0-aeac-c4dd92129ab9@linux.intel.com> Date: Mon, 18 Mar 2019 20:44:12 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <12cce142-6238-475b-b9aa-236531c12c2b@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Compression is implemented using the functions from zstd.c. As the memory to operate on the compression uses mmap->aio.data[] buffers. If Zstd streaming compression API fails for some reason the data to be compressed are just copied into the memory buffers using plain memcpy(). Compressed trace frame consists of an array of PERF_RECORD_COMPRESSED records. Each element of the array is not longer that PERF_SAMPLE_MAX_SIZE and consists of perf_event_header followed by the compressed chunk that is decompressed on the loading stage. perf_mmap__aio_push() is replaced by perf_mmap__push() which is now used in the both serial and AIO streaming cases. perf_mmap__push() is extended with positive return values to signify absence of data ready for processing. Signed-off-by: Alexey Budankov --- tools/perf/builtin-record.c | 114 ++++++++++++++++++++++++++++-------- tools/perf/util/mmap.c | 76 +----------------------- tools/perf/util/mmap.h | 12 ---- 3 files changed, 89 insertions(+), 113 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index c22e65f6b8e6..2e083891affa 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -130,6 +130,8 @@ static int record__write(struct record *rec, struct perf_mmap *map __maybe_unuse return 0; } +static int record__aio_enabled(struct record *rec); +static int record__comp_enabled(struct record *rec); static size_t zstd_compress(struct perf_session *session, void *dst, size_t dst_size, void *src, size_t src_size); @@ -183,9 +185,9 @@ static int record__aio_complete(struct perf_mmap *md, struct aiocb *cblock) if (rem_size == 0) { cblock->aio_fildes = -1; /* - * md->refcount is incremented in perf_mmap__push() for - * every enqueued aio write request so decrement it because - * the request is now complete. + * md->refcount is incremented in record__aio_pushfn() for + * every aio write request started in record__aio_push() so + * decrement it because the request is now complete. */ perf_mmap__put(md); rc = 1; @@ -240,18 +242,89 @@ static int record__aio_sync(struct perf_mmap *md, bool sync_all) } while (1); } -static int record__aio_pushfn(void *to, struct aiocb *cblock, void *bf, size_t size, off_t off) +struct record_aio { + struct record *rec; + void *data; + size_t size; +}; + +static int record__aio_pushfn(struct perf_mmap *map, void *to, void *buf, size_t size) { - struct record *rec = to; - int ret, trace_fd = rec->session->data->file.fd; + struct record_aio *aio = to; - rec->samples++; + /* + * map->base data pointed by buf is copied into free map->aio.data[] buffer + * to release space in the kernel buffer as fast as possible, calling + * perf_mmap__consume() from perf_mmap__push() function. + * + * That lets the kernel to proceed with storing more profiling data into + * the kernel buffer earlier than other per-cpu kernel buffers are handled. + * + * Coping can be done in two steps in case the chunk of profiling data + * crosses the upper bound of the kernel buffer. In this case we first move + * part of data from map->start till the upper bound and then the reminder + * from the beginning of the kernel buffer till the end of the data chunk. + */ - ret = record__aio_write(cblock, trace_fd, bf, size, off); + if (record__comp_enabled(aio->rec)) { + size = zstd_compress(aio->rec->session, aio->data + aio->size, + perf_mmap__mmap_len(map) - aio->size, + buf, size); + } else { + memcpy(aio->data + aio->size, buf, size); + } + + if (!aio->size) { + /* + * Increment map->refcount to guard map->aio.data[] buffer + * from premature deallocation because map object can be + * released earlier than aio write request started on + * map->aio.data[] buffer is complete. + * + * perf_mmap__put() is done at record__aio_complete() + * after started aio request completion or at record__aio_push() + * if the request failed to start. + */ + perf_mmap__get(map); + } + + aio->size += size; + + return size; +} + +static int record__aio_push(struct record *rec, struct perf_mmap *map, off_t *off) +{ + int ret, idx; + int trace_fd = rec->session->data->file.fd; + struct record_aio aio = { .rec = rec, .size = 0 }; + + /* + * Call record__aio_sync() to wait till map->aio.data[] buffer + * becomes available after previous aio write operation. + */ + + idx = record__aio_sync(map, false); + aio.data = map->aio.data[idx]; + ret = perf_mmap__push(map, &aio, record__aio_pushfn); + if (ret != 0) /* ret > 0 - no data, ret < 0 - error */ + return ret; + + rec->samples++; + ret = record__aio_write(&(map->aio.cblocks[idx]), trace_fd, aio.data, aio.size, *off); if (!ret) { - rec->bytes_written += size; + *off += aio.size; + rec->bytes_written += aio.size; if (switch_output_size(rec)) trigger_hit(&switch_output_trigger); + } else { + /* + * Decrement map->refcount incremented in record__aio_pushfn() + * back if record__aio_write() operation failed to start, otherwise + * map->refcount is decremented in record__aio_complete() after + * aio write operation finishes successfully. + */ + perf_mmap__put(map); } return ret; @@ -273,7 +346,7 @@ static void record__aio_mmap_read_sync(struct record *rec) struct perf_evlist *evlist = rec->evlist; struct perf_mmap *maps = evlist->mmap; - if (!rec->opts.nr_cblocks) + if (!record__aio_enabled(rec)) return; for (i = 0; i < evlist->nr_mmaps; i++) { @@ -307,13 +380,8 @@ static int record__aio_parse(const struct option *opt, #else /* HAVE_AIO_SUPPORT */ static int nr_cblocks_max = 0; -static int record__aio_sync(struct perf_mmap *md __maybe_unused, bool sync_all __maybe_unused) -{ - return -1; -} - -static int record__aio_pushfn(void *to __maybe_unused, struct aiocb *cblock __maybe_unused, - void *bf __maybe_unused, size_t size __maybe_unused, off_t off __maybe_unused) +static int record__aio_push(struct record *rec __maybe_unused, struct perf_mmap *map __maybe_unused, + off_t *off __maybe_unused) { return -1; } @@ -824,7 +892,7 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli int rc = 0; struct perf_mmap *maps; int trace_fd = rec->data.file.fd; - off_t off; + off_t off = 0; if (!evlist) return 0; @@ -850,20 +918,14 @@ static int record__mmap_read_evlist(struct record *rec, struct perf_evlist *evli map->flush = 1; } if (!record__aio_enabled(rec)) { - if (perf_mmap__push(map, rec, record__pushfn) != 0) { + if (perf_mmap__push(map, rec, record__pushfn) < 0) { if (sync) map->flush = flush; rc = -1; goto out; } } else { - int idx; - /* - * Call record__aio_sync() to wait till map->data buffer - * becomes available after previous aio write request. - */ - idx = record__aio_sync(map, false); - if (perf_mmap__aio_push(map, rec, idx, record__aio_pushfn, &off) != 0) { + if (record__aio_push(rec, map, &off) < 0) { record__aio_set_pos(trace_fd, off); if (sync) map->flush = flush; diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index d85e73fc82e2..868c0b0e909c 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -289,80 +289,6 @@ static void perf_mmap__aio_munmap(struct perf_mmap *map) zfree(&map->aio.cblocks); zfree(&map->aio.aiocb); } - -int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off), - off_t *off) -{ - u64 head = perf_mmap__read_head(md); - unsigned char *data = md->base + page_size; - unsigned long size, size0 = 0; - void *buf; - int rc = 0; - - rc = perf_mmap__read_init(md); - if (rc < 0) - return (rc == -EAGAIN) ? 0 : -1; - - /* - * md->base data is copied into md->data[idx] buffer to - * release space in the kernel buffer as fast as possible, - * thru perf_mmap__consume() below. - * - * That lets the kernel to proceed with storing more - * profiling data into the kernel buffer earlier than other - * per-cpu kernel buffers are handled. - * - * Coping can be done in two steps in case the chunk of - * profiling data crosses the upper bound of the kernel buffer. - * In this case we first move part of data from md->start - * till the upper bound and then the reminder from the - * beginning of the kernel buffer till the end of - * the data chunk. - */ - - size = md->end - md->start; - - if ((md->start & md->mask) + size != (md->end & md->mask)) { - buf = &data[md->start & md->mask]; - size = md->mask + 1 - (md->start & md->mask); - md->start += size; - memcpy(md->aio.data[idx], buf, size); - size0 = size; - } - - buf = &data[md->start & md->mask]; - size = md->end - md->start; - md->start += size; - memcpy(md->aio.data[idx] + size0, buf, size); - - /* - * Increment md->refcount to guard md->data[idx] buffer - * from premature deallocation because md object can be - * released earlier than aio write request started - * on mmap->data[idx] is complete. - * - * perf_mmap__put() is done at record__aio_complete() - * after started request completion. - */ - perf_mmap__get(md); - - md->prev = head; - perf_mmap__consume(md); - - rc = push(to, &md->aio.cblocks[idx], md->aio.data[idx], size0 + size, *off); - if (!rc) { - *off += size0 + size; - } else { - /* - * Decrement md->refcount back if aio write - * operation failed to start. - */ - perf_mmap__put(md); - } - - return rc; -} #else /* !HAVE_AIO_SUPPORT */ static int perf_mmap__aio_enabled(struct perf_mmap *map __maybe_unused) { @@ -566,7 +492,7 @@ int perf_mmap__push(struct perf_mmap *md, void *to, rc = perf_mmap__read_init(md); if (rc < 0) - return (rc == -EAGAIN) ? 0 : -1; + return (rc == -EAGAIN) ? 1 : -1; size = md->end - md->start; diff --git a/tools/perf/util/mmap.h b/tools/perf/util/mmap.h index 4e2f58d95c1f..274ce389cd84 100644 --- a/tools/perf/util/mmap.h +++ b/tools/perf/util/mmap.h @@ -101,18 +101,6 @@ union perf_event *perf_mmap__read_event(struct perf_mmap *map); int perf_mmap__push(struct perf_mmap *md, void *to, int push(struct perf_mmap *map, void *to, void *buf, size_t size)); -#ifdef HAVE_AIO_SUPPORT -int perf_mmap__aio_push(struct perf_mmap *md, void *to, int idx, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off), - off_t *off); -#else -static inline int perf_mmap__aio_push(struct perf_mmap *md __maybe_unused, void *to __maybe_unused, int idx __maybe_unused, - int push(void *to, struct aiocb *cblock, void *buf, size_t size, off_t off) __maybe_unused, - off_t *off __maybe_unused) -{ - return 0; -} -#endif size_t perf_mmap__mmap_len(struct perf_mmap *map); -- 2.20.1