2023-11-22 05:15:33

by Anshuman Khandual

[permalink] [raw]
Subject: Re: [V14 0/8] arm64/perf: Enable branch stack sampling

On 11/14/23 22:47, James Clark wrote:
>
>
> On 14/11/2023 05:13, Anshuman Khandual wrote:
>> This series enables perf branch stack sampling support on arm64 platform
>> via a new arch feature called Branch Record Buffer Extension (BRBE). All
>> the relevant register definitions could be accessed here.
>>
> [...]
>>
>> --------------------------- Virtualisation support ------------------------
>>
>> - Branch stack sampling is not currently supported inside the guest (TODO)
>>
>> - FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE
>> - Future support in guest requires emulating FEAT_BRBE
>
> If you never add support for the host looking into a guest, and you save

But that seems to be a valid use case though. Is there a particular concern
why such capability should or could not be added for BRBE ?

> and restore all the BRBINF[n] registers, I think you might be able to
> just let the guest do whatever it wants with BRBE and not trap and
> emulate it? Maybe there is some edge case why that wouldn't work, but
> it's worth thinking about.

Right, in case host tracing of the guest is not supported (although still
wondering why it should not be), saving and restoring complete BRBE state
i.e all system registers that can be accessed from guest, would let guest
do what ever it wants with BRBE without requiring the trap-emulate model.

>
> For BRBE specifically I don't see much of a use case for hosts looking
> into a guest, at least not like with PMU counters.
But how is it any different from normal PMU counters ? Branch records do
provide statistical insights into hot sections in the guest.


2023-11-23 16:24:01

by James Clark

[permalink] [raw]
Subject: Re: [V14 0/8] arm64/perf: Enable branch stack sampling



On 22/11/2023 05:15, Anshuman Khandual wrote:
> On 11/14/23 22:47, James Clark wrote:
>>
>>
>> On 14/11/2023 05:13, Anshuman Khandual wrote:
>>> This series enables perf branch stack sampling support on arm64 platform
>>> via a new arch feature called Branch Record Buffer Extension (BRBE). All
>>> the relevant register definitions could be accessed here.
>>>
>> [...]
>>>
>>> --------------------------- Virtualisation support ------------------------
>>>
>>> - Branch stack sampling is not currently supported inside the guest (TODO)
>>>
>>> - FEAT_BRBE advertised as absent via clearing ID_AA64DFR0_EL1.BRBE
>>> - Future support in guest requires emulating FEAT_BRBE
>>
>> If you never add support for the host looking into a guest, and you save
>
> But that seems to be a valid use case though. Is there a particular concern
> why such capability should or could not be added for BRBE ?
>

What's the use case exactly? You wouldn't even have the binary mappings
of the guest without running perf inside the guest too, and at that
point you might as well have just done the BRBE recording from inside
the guest.

My particular concern is only about the effort required to implement it,
vs its usefulness. Not that we shouldn't ever implement the fully shared
BRBE between host and guest, we could always do it later. My idea was
just to get BRBE working inside of guests quicker.

>> and restore all the BRBINF[n] registers, I think you might be able to
>> just let the guest do whatever it wants with BRBE and not trap and
>> emulate it? Maybe there is some edge case why that wouldn't work, but
>> it's worth thinking about.
>
> Right, in case host tracing of the guest is not supported (although still
> wondering why it should not be), saving and restoring complete BRBE state
> i.e all system registers that can be accessed from guest, would let guest
> do what ever it wants with BRBE without requiring the trap-emulate model.
>
>>
>> For BRBE specifically I don't see much of a use case for hosts looking
>> into a guest, at least not like with PMU counters.
> But how is it any different from normal PMU counters ? Branch records do
> provide statistical insights into hot sections in the guest.
>

There is a big difference, PMU counters can be used to infer general
things about a system without any extra information. That's something
that could be used by a monitoring task or someone looking at a guest
running a known workload.

But for BRBE you need the binaries, mappings, scheduling events, thread
switches etc to make any sense of the pointers in the branch buffers,
otherwise they're just random numbers from who knows which process.