Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932934AbbBQIdy (ORCPT ); Tue, 17 Feb 2015 03:33:54 -0500 Received: from e8.ny.us.ibm.com ([32.97.182.138]:38296 "EHLO e8.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756272AbbBQIds (ORCPT ); Tue, 17 Feb 2015 03:33:48 -0500 Date: Tue, 17 Feb 2015 00:33:12 -0800 From: Sukadev Bhattiprolu To: Peter Zijlstra Cc: mingo@kernel.org, Michael Ellerman , Anton Blanchard , Stephane Eranian , Jiri Olsa , Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org Subject: Re: [RFC][PATCH] perf: Implement read_group() PMU operation Message-ID: <20150217083312.GA16221@us.ibm.com> References: <20150206025915.GA31650@us.ibm.com> <20150212155856.GC21418@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150212155856.GC21418@twins.programming.kicks-ass.net> X-Operating-System: Linux 2.0.32 on an i486 User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15021708-0029-0000-0000-00000202EC59 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3658 Lines: 119 Peter Zijlstra [peterz@infradead.org] wrote: | > --- a/kernel/events/core.c | > +++ b/kernel/events/core.c | > @@ -3549,10 +3549,43 @@ static int perf_event_read_group(struct perf_event *event, | | You also want perf_output_read_group(). Ok. Will look into it. We currently don't support sampling with the 24x7 counters but we should make sure that the new interface fits correctly. | | > struct perf_event *leader = event->group_leader, *sub; | > int n = 0, size = 0, ret = -EFAULT; | > struct perf_event_context *ctx = leader->ctx; | > + u64 *valuesp; | > u64 values[5]; | > + int use_group_read; | > u64 count, enabled, running; | > + struct pmu *pmu = event->pmu; | > + | > + /* | > + * If PMU supports group read and group read is requested, | > + * allocate memory before taking the mutex. | > + */ | > + use_group_read = 0; | > + if ((read_format & PERF_FORMAT_GROUP) && pmu->read_group) { | > + use_group_read++; | > + } | > + | > + if (use_group_read) { | > + valuesp = kzalloc(leader->nr_siblings * sizeof(u64), GFP_KERNEL); | > + if (!valuesp) | > + return -ENOMEM; | > + } | | This seems 'sad', the hardware already knows how many it can maximally | use at once and can preallocate, right? Yeah :-) In a subsequent version, I got rid of the allocation by moving the copy_to_user() into the PMU, but still needed to walk the sibling list. | | > | > mutex_lock(&ctx->mutex); | > + | > + if (use_group_read) { | > + ret = pmu->read_group(leader, valuesp, leader->nr_siblings); | > + if (ret >= 0) { | > + size = ret * sizeof(u64); | > + | > + ret = size; | > + if (copy_to_user(buf, valuesp, size)) | > + ret = -EFAULT; | > + } | > + | > + kfree(valuesp); | > + goto unlock; | > + } | > + | > count = perf_event_read_value(leader, &enabled, &running); | > | > values[n++] = 1 + leader->nr_siblings; | | Since ->read() has a void return value, we can delay its effect, so I'm | currently thinking we might want to extend the transaction interface for | this; give pmu::start_txn() a flags argument to indicate scheduling | (add) or reading (read). | | So we'd end up with something like: | | pmu->start_txn(pmu, PMU_TXN_READ); | | leader->read(); | | for_each_sibling() | sibling->read(); | | pmu->commit_txn(); So, each of the ->read() calls is really "appending a counter" to a list of counters that the PMU should read and the values for the counters (results of the read) are only available after the commit_txn() right? In which case, perf_event_read_group() would then follow this commit_txn() with its "normal" read, and the PMU would return the result cached during ->commit_txn(). If so, we need a way to invalidate the cached result ? | | after which we can use the values updated by the read calls. The trivial | no-support implementation lets read do its immediate thing like it does | now. | | A more complex driver can then collect the actual counter values and | execute one hypercall using its pre-allocated memory. the hypercall should happen in the ->commit_txn() right ? | | So no allocations in the core code, and no sibling iterations in the | driver code. | | Would that work for you? I think it would, I am working on breaking up the underlying code along start/read/commit lines, and hope to have it done later this week and then can experiment more with this interface. Appreciate the input. Sukadev -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/