MIME-Version: 1.0
In-Reply-To: <20161129092520.GB3092@twins.programming.kicks-ass.net>
References: <1480361206-1702-1-git-send-email-kan.liang@intel.com> <20161129092520.GB3092@twins.programming.kicks-ass.net>
From: Stephane Eranian <eranian@google.com>
Date: Tue, 29 Nov 2016 09:20:10 -0800
Message-ID: <CABPqkBQPRBZZB8wbhYGLoS9ww_0vQrt=nkqgDC_fFgG99cqdCg@mail.gmail.com>
Subject: Re: [PATCH] perf/x86: fix event counter update issue
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Liang, Kan" <kan.liang@intel.com>,
        "mingo@redhat.com" <mingo@redhat.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Alexander Shishkin <alexander.shishkin@linux.intel.com>,
        "ak@linux.intel.com" <ak@linux.intel.com>,
        "Odzioba, Lukasz" <lukasz.odzioba@intel.com>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2844
Lines: 71

On Tue, Nov 29, 2016 at 1:25 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> So caveat that I'm ill and cannot think much..
>
> On Mon, Nov 28, 2016 at 11:26:46AM -0800, kan.liang@intel.com wrote:
>
> > Here, all the possible failure cases are listed.
> > Terms:
> >     - new: current PMU counter value which read from rdpmcl.
> >     - prev: previous counter value which is stored in &hwc->prev_count.
> >     - in PMI/not in PMI: the event update happens in PMI handler or not.
>
> > Current code to calculate delta:
> >     delta = (new << shift) - (prev << shift);
> >     delta >>= shift;
> >
> > Case A: Not in PMI.  new > prev. But delta is negative.
> >    That's the failure case of Test 2.
> >    delta is s64 type. new and prev are u64 type. If the new is big
> >    enough, after doing left shift and sub, the bit 64 of the result may
> >    still be 1.
> >    After converting to s64, the sign flag will be set. Since delta is
> >    s64 type, arithmetic right shift is applied, which copy the sign flag
> >    into empty bit positions on the upper end of delta.
> >    It can be fixed by adding the max count value.
> >
> >    Here is the real data for test2 on KNL.
> >    new = aea96e1927
> >    prev = ffffff0000000001
> >    delta = aea96e1927000000 - 1000000 = aea96e1926000000
> >    aea96e1926000000 >> 24 = ffffffaea96e1926   <<  negative delta
>
> How can this happen? IIRC the thing increments, we program a negative
> value, and when it passes 0 we generate a PMI.
>
Yeah, that's the part I don't quite understand thus my initial
question. What you describe
is what is supposed to happen regardless of the counter width.

>
> And note that we _ALWAYS_ set the IN bits, even for !sampling events.
> Also note we set max_period to (1<<31) - 1, so we should never exceed 31
> bits.
>
Max period is limited by the number of bits the kernel can write to an MSR.
Used to be 31, now it is 47 for core PMU as per patch pointed to by Kan.
>
>
> > Case B: In PMI. new > prev. delta is positive.
> >    That's the failure case of Test 3.
> >    The PMI is triggered by overflow. But there is latency between
> >    overflow and PMI handler. So new has small amount.
> >    Current calculation lose the max count value.
>
> That doesn't make sense, per the 31bit limit.
>
>
> > Case C: In PMI. new < prev. delta is negative.
> >    The PMU counter may be start from a big value. E.g. the fixed period
> >    is small.
> >    It can be fixed by adding the max count value.
>
> Doesn't make sense, how can this happen?


The only issue I could think of is in case the counter is reprogrammed
to a different value which could be much lower or higher than the last
saved value, like for instance, in frequency mode. Otherwise each
counter increments monotonically from the period. Counting events are
also programmed that way.