Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751359AbaLPTr1 (ORCPT ); Tue, 16 Dec 2014 14:47:27 -0500 Received: from mga11.intel.com ([192.55.52.93]:21366 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750882AbaLPTr0 (ORCPT ); Tue, 16 Dec 2014 14:47:26 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,588,1413270000"; d="scan'208";a="638571443" From: kan.liang@intel.com To: a.p.zijlstra@chello.nl, linux-kernel@vger.kernel.org Cc: eranian@google.com, ak@linux.intel.com, Kan Liang Subject: [PATCH V2 1/1] perf, core: Use sample period avg as child event's initial period Date: Tue, 16 Dec 2014 14:25:18 -0500 Message-Id: <1418757918-27893-1-git-send-email-kan.liang@intel.com> X-Mailer: git-send-email 1.8.3.2 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Kan Liang For perf record frequency mode, the initial sample_period is 1. That's because perf doesn't know what period should be set. It uses the minimum period 1 as the first period. It will trigger an interrupt soon. Then there will be enough data to calculate the period for the given frequency. But too many very short period like 1 may cause various problems and increase the overhead. It's better to limit the 1 period to just the first several period setting. However, for some workload, 1 period is frequently set. For example, perf record a busy loop for 10 seconds. perf record ./finity_busy_loop.sh 10 while [ "A" != "B" ] do date > /dev/null done Period was changed 150503 times in 10 seconds. 22.5% (33861 times) of the period is set to 1. That's because, in the inherit_event, the period for child event is inherit from parent's parent's event, which is usually the default sample_period 1. Each child event has to recalculate the period from 1 everytime. That brings high overhead. This patch keeps the sample period average in parent event. Each new child event can use it as its initial sample period. The avg_sample_period was introduced in struct perf_event to track the average sample period information. The information is kept in the parent event, since it's the only node everyone knows. For reducing the contention of updating parent event, the value is updated every tick. We also need to get rid of 1 period as earlier as possible. So the value is also updated in the first period adjustment. For each new child event, the average sample period of parent event will be assigned to the child as its first sample period. For each new child event, the parent event refcount++. Parent will not go away until all children go away. So it's safe to access its parent. After applying this patch, the 1 period rate reduces to 0.1%. Signed-off-by: Kan Liang --- Changes since V1 - The average sample period will be kept in parent event - Use atomic64_t to replace local64_t - Only update the AVG every tick or the first time include/linux/perf_event.h | 2 ++ kernel/events/core.c | 21 +++++++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 486e84c..bb886f0 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -403,6 +403,8 @@ struct perf_event { struct list_head child_list; struct perf_event *parent; + atomic64_t avg_sample_period; + int oncpu; int cpu; diff --git a/kernel/events/core.c b/kernel/events/core.c index af0a5ba..3404d52 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -2796,6 +2796,7 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo struct hw_perf_event *hwc = &event->hw; s64 period, sample_period; s64 delta; + struct perf_event *head_event = (event->parent != NULL) ? event->parent : event; period = perf_calculate_period(event, nsec, count); @@ -2809,6 +2810,17 @@ static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bo hwc->sample_period = sample_period; + /* + * Update the AVG sample period in first period adjustment + * Then update it every tick to reduce contention. + */ + if (!disable || (atomic64_read(&head_event->avg_sample_period) == 1)) { + s64 avg_period; + + avg_period = (atomic64_read(&head_event->avg_sample_period) + sample_period) / 2; + atomic64_set(&head_event->avg_sample_period, avg_period); + } + if (local64_read(&hwc->period_left) > 8*sample_period) { if (disable) event->pmu->stop(event, PERF_EF_UPDATE); @@ -7030,8 +7042,13 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, hwc = &event->hw; hwc->sample_period = attr->sample_period; - if (attr->freq && attr->sample_freq) + if (attr->freq && attr->sample_freq) { hwc->sample_period = 1; + if (parent_event) + hwc->sample_period = atomic64_read(&parent_event->avg_sample_period); + else + atomic64_set(&event->avg_sample_period, hwc->sample_period); + } hwc->last_period = hwc->sample_period; local64_set(&hwc->period_left, hwc->sample_period); @@ -7902,7 +7919,7 @@ inherit_event(struct perf_event *parent_event, child_event->state = PERF_EVENT_STATE_OFF; if (parent_event->attr.freq) { - u64 sample_period = parent_event->hw.sample_period; + u64 sample_period = atomic64_read(&parent_event->avg_sample_period); struct hw_perf_event *hwc = &child_event->hw; hwc->sample_period = sample_period; -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/