2023-06-12 10:49:52

by Yang Jihong

[permalink] [raw]
Subject: [RFC] Adding support for setting the affinity of the recording process

Hello everyone,

Currently, perf-record supports profiling an existing process, thread,
or a specified command.

Sometimes we may need to set CPU affinity of the target process before
recording:

# taskset -pc <cpus> <pid>
# perf record -p <pid> -- sleep 10

or:

# perf record -- `taskset -c <cpus> COMMAND`

I'm thinking about getting perf to support setting the affinity of the
recording process, for example:

1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process
to <cpus2>, and record:

# perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10

and

2. set CPU affinity of the COMMAND and record:

# perf record --taskset-command <cpus> COMMAND

In doing so, perf, as an observer, actually changes some of the
properties of the target process, which may be contrary to the purpose
of perf tool.


Will we consider accepting this approach?

Thanks,
Yang.


2023-06-12 15:25:50

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Em Mon, Jun 12, 2023 at 06:26:10PM +0800, Yang Jihong escreveu:
> Hello everyone,
>
> Currently, perf-record supports profiling an existing process, thread, or a
> specified command.
>
> Sometimes we may need to set CPU affinity of the target process before
> recording:
>
> # taskset -pc <cpus> <pid>
> # perf record -p <pid> -- sleep 10
>
> or:
>
> # perf record -- `taskset -c <cpus> COMMAND`
>
> I'm thinking about getting perf to support setting the affinity of the
> recording process, for example:

not of the 'recording process' but the 'observed process', right?

> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process to
> <cpus2>, and record:
>
> # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10

but what would be the semantic for what is being observed? Would this
result in it recording events on that CPU or just for that process (that
now runs just on that CPU)?

Without affinity setting that could mean: observe just that process when
it runs on that CPU.

But could you please spell out the use case, why do you need this, is
this so common (for you) that you repeatedly need to first taskset, then
perf, etc?

> and
>
> 2. set CPU affinity of the COMMAND and record:
>
> # perf record --taskset-command <cpus> COMMAND
>
> In doing so, perf, as an observer, actually changes some of the properties
> of the target process, which may be contrary to the purpose of perf tool.

Up for discussion, but I don't think this is that much a problem if it
streamlines common observability sessions/experimentations.

> Will we consider accepting this approach?

- Arnaldo

2023-06-12 15:26:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

On Mon, Jun 12, 2023 at 11:24:26AM -0300, Arnaldo Carvalho de Melo wrote:
> But could you please spell out the use case, why do you need this, is
> this so common (for you) that you repeatedly need to first taskset, then
> perf, etc?

I'm thinking this is due to big.LITTLE things where the PMUs on the CPUs
are different. Intel recently having stepped into this trainwreck,
there's now a ton more people 'enjoying' it ...

So what people often do it is affine the process to one type of CPU and
then perf it so that their measurements are consistent.

It all sucks, but given the situation, it might be the best option :/

Ian was working on improving the whole hybrid thing, perhaps he has more
opinions.

2023-06-12 16:02:18

by James Clark

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process



On 12/06/2023 11:26, Yang Jihong wrote:
> Hello everyone,
>
> Currently, perf-record supports profiling an existing process, thread,
> or a specified command.
>
> Sometimes we may need to set CPU affinity of the target process before
> recording:
>
>   # taskset -pc <cpus> <pid>
>   # perf record -p <pid> -- sleep 10
>
> or:
>
>   # perf record -- `taskset -c <cpus> COMMAND`
>
> I'm thinking about getting perf to support setting the affinity of the
> recording process, for example:
>
> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process
> to <cpus2>,  and record:
>
>   # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10
>

I'm not sure if this is necessary. You can already do this with taskset
when you launch the processes or for existing ones.

> and
>
> 2. set CPU affinity of the COMMAND and record:
>
>   # perf record --taskset-command <cpus> COMMAND
>
> In doing so, perf, as an observer, actually changes some of the
> properties of the target process, which may be contrary to the purpose
> of perf tool.
>
>
> Will we consider accepting this approach?
>

For #2 I do this sometimes, but I prefix the perf command with taskset
because otherwise there is a small time between when taskset does its
thing and launching the child process that it runs in the wrong place.

Then one issue with the above method is that perf itself gets pinned to
those CPUs as well. I suppose that could influence your application but
I've never had an issue with it.

If you really can't live with perf also being pinned to those CPUs I
would say it makes sense to add options for #2. Otherwise I would just
run everything under taskset and no changes are needed.

I think you would still need to have perf itself pinned to the CPUs just
before it does the fork and exec, and then after that it can undo its
pinning. Otherwise you'd still get that small time running on the wrong
cores.

James

> Thanks,
> Yang.

2023-06-13 02:18:13

by Yang Jihong

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Hello,

Sorry, I forgot to add another recipient in the last email. Send it again.

On 2023/6/12 22:24, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 12, 2023 at 06:26:10PM +0800, Yang Jihong escreveu:
>> Hello everyone,
>>
>> Currently, perf-record supports profiling an existing process, thread, or a
>> specified command.
>>
>> Sometimes we may need to set CPU affinity of the target process before
>> recording:
>>
>> # taskset -pc <cpus> <pid>
>> # perf record -p <pid> -- sleep 10
>>
>> or:
>>
>> # perf record -- `taskset -c <cpus> COMMAND`
>>
>> I'm thinking about getting perf to support setting the affinity of the
>> recording process, for example:
>
> not of the 'recording process' but the 'observed process', right?
>


Yes, it's the process of being observed.

>> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process to
>> <cpus2>, and record:
>>
>> # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10
>
> but what would be the semantic for what is being observed? Would this
> result in it recording events on that CPU or just for that process (that
> now runs just on that CPU)?
>

just for the process running on a specific CPU.

> Without affinity setting that could mean: observe just that process when
> it runs on that CPU.
>
> But could you please spell out the use case, why do you need this, is
> this so common (for you) that you repeatedly need to first taskset, then
> perf, etc?

As Peter said, big.LITTLE is a common scenario where a process may
behave differently on different CPUs.

There are other scenarios. For example, if I run a server and a client
and do not set affinity for them, they may sometimes run on the same
NUMA node or on different NUMA nodes due to scheduling reasons.
In this case, the performance may fluctuate due to reasons such as cache
miss. When analyzing performance problems, we sometimes care about
stability.

>
>> and
>>
>> 2. set CPU affinity of the COMMAND and record:
>>
>> # perf record --taskset-command <cpus> COMMAND
>>
>> In doing so, perf, as an observer, actually changes some of the properties
>> of the target process, which may be contrary to the purpose of perf tool.
>
> Up for discussion, but I don't think this is that much a problem if it
> streamlines common observability sessions/experimentations.

If the perf is used to set the affinity of the observed process, it
actually interferes with some behavior of the target process (such as
affecting scheduling).
In this scenario, the perf is not just a simple observer. Therefore, I
am not sure whether this behavior is acceptable.

Thank you for your reply.

Thanks,
Yang.

2023-06-13 02:23:36

by Yang Jihong

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Hello,

On 2023/6/12 23:05, Peter Zijlstra wrote:
> On Mon, Jun 12, 2023 at 11:24:26AM -0300, Arnaldo Carvalho de Melo wrote:
>> But could you please spell out the use case, why do you need this, is
>> this so common (for you) that you repeatedly need to first taskset, then
>> perf, etc?
>
> I'm thinking this is due to big.LITTLE things where the PMUs on the CPUs
> are different. Intel recently having stepped into this trainwreck,
> there's now a ton more people 'enjoying' it ...
>
> So what people often do it is affine the process to one type of CPU and
> then perf it so that their measurements are consistent.
Yes, it's a scene, the purpose of setting affinity is to ensure the
stability of recorded events.

>
> It all sucks, but given the situation, it might be the best option :/
>
> Ian was working on improving the whole hybrid thing, perhaps he has more
> opinions.
>
Thank you for your reply.

> .
>

Thanks,
Yang.

2023-06-13 02:52:20

by Yang Jihong

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Hello,

On 2023/6/12 23:27, James Clark wrote:
>
>
> On 12/06/2023 11:26, Yang Jihong wrote:
>> Hello everyone,
>>
>> Currently, perf-record supports profiling an existing process, thread,
>> or a specified command.
>>
>> Sometimes we may need to set CPU affinity of the target process before
>> recording:
>>
>>   # taskset -pc <cpus> <pid>
>>   # perf record -p <pid> -- sleep 10
>>
>> or:
>>
>>   # perf record -- `taskset -c <cpus> COMMAND`
>>
>> I'm thinking about getting perf to support setting the affinity of the
>> recording process, for example:
>>
>> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process
>> to <cpus2>,  and record:
>>
>>   # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10
>>
>
> I'm not sure if this is necessary. You can already do this with taskset
> when you launch the processes or for existing ones.

Yes, that's what we're doing now, and I'm thinking about whether perf
can support this "taskset" feature.

>
>> and
>>
>> 2. set CPU affinity of the COMMAND and record:
>>
>>   # perf record --taskset-command <cpus> COMMAND
>>
>> In doing so, perf, as an observer, actually changes some of the
>> properties of the target process, which may be contrary to the purpose
>> of perf tool.
>>
>>
>> Will we consider accepting this approach?
>>
>
> For #2 I do this sometimes, but I prefix the perf command with taskset
> because otherwise there is a small time between when taskset does its
> thing and launching the child process that it runs in the wrong place.
>
> Then one issue with the above method is that perf itself gets pinned to
> those CPUs as well. I suppose that could influence your application but
> I've never had an issue with it.
>
> If you really can't live with perf also being pinned to those CPUs I
> would say it makes sense to add options for #2. Otherwise I would just
> run everything under taskset and no changes are needed.

If "perf" process and the target process are pinned to the same CPU,
and the CPU usage of the target process is high, the perf data
collection may be affected. Therefore, in this case, we may need to pin
the target process and "perf" to different CPUs.

>
> I think you would still need to have perf itself pinned to the CPUs just
> before it does the fork and exec, and then after that it can undo its
> pinning. Otherwise you'd still get that small time running on the wrong
> cores.
>

Thanks for your advice, or we can support setting different affinities
for the "perf" process and the target process.


Thanks,
Yang.

2023-06-13 06:46:34

by Namhyung Kim

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Hello,

On Mon, Jun 12, 2023 at 7:28 PM Yang Jihong <[email protected]> wrote:
>
> Hello,
>
> On 2023/6/12 23:27, James Clark wrote:
> >
> >
> > On 12/06/2023 11:26, Yang Jihong wrote:
> >> Hello everyone,
> >>
> >> Currently, perf-record supports profiling an existing process, thread,
> >> or a specified command.
> >>
> >> Sometimes we may need to set CPU affinity of the target process before
> >> recording:
> >>
> >> # taskset -pc <cpus> <pid>
> >> # perf record -p <pid> -- sleep 10
> >>
> >> or:
> >>
> >> # perf record -- `taskset -c <cpus> COMMAND`
> >>
> >> I'm thinking about getting perf to support setting the affinity of the
> >> recording process, for example:
> >>
> >> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process
> >> to <cpus2>, and record:
> >>
> >> # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10
> >>
> >
> > I'm not sure if this is necessary. You can already do this with taskset
> > when you launch the processes or for existing ones.
>
> Yes, that's what we're doing now, and I'm thinking about whether perf
> can support this "taskset" feature.

I agree with James that it looks out of scope of perf tools.
You can always use `taskset` for external processes.

>
> >
> >> and
> >>
> >> 2. set CPU affinity of the COMMAND and record:
> >>
> >> # perf record --taskset-command <cpus> COMMAND
> >>
> >> In doing so, perf, as an observer, actually changes some of the
> >> properties of the target process, which may be contrary to the purpose
> >> of perf tool.
> >>
> >>
> >> Will we consider accepting this approach?
> >>
> >
> > For #2 I do this sometimes, but I prefix the perf command with taskset
> > because otherwise there is a small time between when taskset does its
> > thing and launching the child process that it runs in the wrong place.
> >
> > Then one issue with the above method is that perf itself gets pinned to
> > those CPUs as well. I suppose that could influence your application but
> > I've never had an issue with it.
> >
> > If you really can't live with perf also being pinned to those CPUs I
> > would say it makes sense to add options for #2. Otherwise I would just
> > run everything under taskset and no changes are needed.
>
> If "perf" process and the target process are pinned to the same CPU,
> and the CPU usage of the target process is high, the perf data
> collection may be affected. Therefore, in this case, we may need to pin
> the target process and "perf" to different CPUs.
>
> >
> > I think you would still need to have perf itself pinned to the CPUs just
> > before it does the fork and exec, and then after that it can undo its
> > pinning. Otherwise you'd still get that small time running on the wrong
> > cores.
> >
>
> Thanks for your advice, or we can support setting different affinities
> for the "perf" process and the target process.

When it comes to controlling `perf`, you can use --threads=<spec>
option which supports fairly complex control for parallelism and
affinity.

Thanks,
Namhyung

2023-06-13 07:28:14

by Yang Jihong

[permalink] [raw]
Subject: Re: [RFC] Adding support for setting the affinity of the recording process

Hello,

On 2023/6/13 13:50, Namhyung Kim wrote:
> Hello,
>
> On Mon, Jun 12, 2023 at 7:28 PM Yang Jihong <[email protected]> wrote:
>>
>> Hello,
>>
>> On 2023/6/12 23:27, James Clark wrote:
>>>
>>>
>>> On 12/06/2023 11:26, Yang Jihong wrote:
>>>> Hello everyone,
>>>>
>>>> Currently, perf-record supports profiling an existing process, thread,
>>>> or a specified command.
>>>>
>>>> Sometimes we may need to set CPU affinity of the target process before
>>>> recording:
>>>>
>>>> # taskset -pc <cpus> <pid>
>>>> # perf record -p <pid> -- sleep 10
>>>>
>>>> or:
>>>>
>>>> # perf record -- `taskset -c <cpus> COMMAND`
>>>>
>>>> I'm thinking about getting perf to support setting the affinity of the
>>>> recording process, for example:
>>>>
>>>> 1. set the CPU affinity of the <pid1> process to <cpus1>, <pid2> process
>>>> to <cpus2>, and record:
>>>>
>>>> # perf record -p <pid1>/<cpus1>:<pid2>/<cpus2> -- sleep 10
>>>>
>>>
>>> I'm not sure if this is necessary. You can already do this with taskset
>>> when you launch the processes or for existing ones.
>>
>> Yes, that's what we're doing now, and I'm thinking about whether perf
>> can support this "taskset" feature.
>
> I agree with James that it looks out of scope of perf tools.
> You can always use `taskset` for external processes.
>
OK, so let's not consider this scenario.
>>
>>>
>>>> and
>>>>
>>>> 2. set CPU affinity of the COMMAND and record:
>>>>
>>>> # perf record --taskset-command <cpus> COMMAND
>>>>
>>>> In doing so, perf, as an observer, actually changes some of the
>>>> properties of the target process, which may be contrary to the purpose
>>>> of perf tool.
>>>>
>>>>
>>>> Will we consider accepting this approach?
>>>>
>>>
>>> For #2 I do this sometimes, but I prefix the perf command with taskset
>>> because otherwise there is a small time between when taskset does its
>>> thing and launching the child process that it runs in the wrong place.
>>>
>>> Then one issue with the above method is that perf itself gets pinned to
>>> those CPUs as well. I suppose that could influence your application but
>>> I've never had an issue with it.
>>>
>>> If you really can't live with perf also being pinned to those CPUs I
>>> would say it makes sense to add options for #2. Otherwise I would just
>>> run everything under taskset and no changes are needed.
>>
>> If "perf" process and the target process are pinned to the same CPU,
>> and the CPU usage of the target process is high, the perf data
>> collection may be affected. Therefore, in this case, we may need to pin
>> the target process and "perf" to different CPUs.
>>
>>>
>>> I think you would still need to have perf itself pinned to the CPUs just
>>> before it does the fork and exec, and then after that it can undo its
>>> pinning. Otherwise you'd still get that small time running on the wrong
>>> cores.
>>>
>>
>> Thanks for your advice, or we can support setting different affinities
>> for the "perf" process and the target process.
>
> When it comes to controlling `perf`, you can use --threads=<spec>
> option which supports fairly complex control for parallelism and
> affinity.
>
Yes, we can ues --threads=<spec>

In addition to the above, or we can simply add a parameter to pin the
COMMAND to specific cpus.

Thank you for your reply.

Thanks,
Yang.