Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758231AbbLBNkn (ORCPT ); Wed, 2 Dec 2015 08:40:43 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:57338 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757325AbbLBNkm (ORCPT ); Wed, 2 Dec 2015 08:40:42 -0500 From: Wang Nan To: , CC: , , , , , Wang Nan , Adrian Hunter , David Ahern , Ingo Molnar , Peter Zijlstra , Yunlong Song Subject: [RFC PATCH] perf/core: Put size of a sample at the end of it Date: Wed, 2 Dec 2015 13:38:19 +0000 Message-ID: <1449063499-236703-1-git-send-email-wangnan0@huawei.com> X-Mailer: git-send-email 1.8.3.4 In-Reply-To: <565EAAFD.3000103@huawei.com> References: <565EAAFD.3000103@huawei.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.107.193.248] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.565EF46B.00AC,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 9071d78479918b0ee3e221f9b3207a64 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4680 Lines: 137 This is an RFC patch which is for overwrite mode ring buffer. I'd like to discuss the correctness of this new idea for retriving as many events as possible from overwrite mode ring buffer. If there's no fundamental problem, I'll start perf side work. The biggest problem for overwrite ring buffer is that it is hard to find the start position of valid record. [1] and [2] tries to solve this problem by introducing 'tail' and 'next_tail' into metadata page, and update them each time the ring buffer is half full. Which adds more instructions to event output code path, hurt performance. In addition, even with them we still unable to recover all possible records. For example: data_tail head | | V V +------+-------+----------+------+---+ | A | B | C | D | | +------+-------+----------+------+---+ If a record written at head pointer and it overwrites record A: head data_tail | | V V +--+---+-------+----------+------+---+ |E |...| B | C | D | E | +--+---+-------+----------+------+---+ Record B is still valid but we can't get it through data_tail. This patch suggests a different solution for this problem that, by appending the length of a record at the end of it, user program is possible to get every possible record in a backward manner, don't need saving tail pointer. For example: head | V +--+---+-------+----------+------+---+ |E6|...| B 8| C 11| D 7|E..| +--+---+-------+----------+------+---+ In this case, from the 'head' pointer provided by kernel, user program can first see '6' by (*(head - sizeof(u16))), then it can get the start pointer of record 'E', then it can read size and find start position of record D, C, B in similar way. Kernel side implementation is easy: simply adding a PERF_SAMPLE_SIZE for the size output. This sloution requires user program (perf) do more things. At least following things and limitations should be considered: 1. Before reading such ring buffer, perf must ensure all events which may output to it is already stopped, so the 'head' pointer it get is the end of the last record. 2. We must ensure all events attached this ring buffer has 'PERF_SAMPLE_SIZE' selected. 3. There must no tracking events output to this ring buffer. 4. 2 bytes extra space is required for each record. Further improvement can be taken: 1. If PERF_SAMPLE_SIZE is selected, we can avoid outputting the event size in header. Which eliminate extra space cose; 2. We can find a way to append size information for tracking events also. [1] http://lkml.kernel.org/r/20130708121557.GA17211@twins.programming.kicks-ass.net [2] http://lkml.kernel.org/r/20151023151205.GW11639@twins.programming.kicks-ass.net Signed-off-by: Wang Nan Cc: Adrian Hunter Cc: Arnaldo Carvalho de Melo Cc: David Ahern Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Yunlong Song --- include/uapi/linux/perf_event.h | 3 ++- kernel/events/core.c | 6 ++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 1afe962..c4066da 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -139,8 +139,9 @@ enum perf_event_sample_format { PERF_SAMPLE_IDENTIFIER = 1U << 16, PERF_SAMPLE_TRANSACTION = 1U << 17, PERF_SAMPLE_REGS_INTR = 1U << 18, + PERF_SAMPLE_SIZE = 1U << 19, - PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */ }; /* diff --git a/kernel/events/core.c b/kernel/events/core.c index 5854fcf..bbbacec 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5473,6 +5473,9 @@ void perf_output_sample(struct perf_output_handle *handle, } } + if (sample_type & PERF_SAMPLE_SIZE) + perf_output_put(handle, header->size); + if (!event->attr.watermark) { int wakeup_events = event->attr.wakeup_events; @@ -5592,6 +5595,9 @@ void perf_prepare_sample(struct perf_event_header *header, header->size += size; } + + if (sample_type & PERF_SAMPLE_SIZE) + header->size += sizeof(u16); } void perf_event_output(struct perf_event *event, -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/