2020-01-11 09:58:32

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 11.01.2020 3:35, [email protected] wrote:
> <[email protected]>,Jann Horn <[email protected]>,Thomas Gleixner <[email protected]>,Tvrtko Ursulin <[email protected]>,Lionel Landwerlin <[email protected]>,linux-kernel <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>
> From: Arnaldo Carvalho de Melo <[email protected]>
> Message-ID: <[email protected]>
>
> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
>>
>>
>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
>> wrote:
>>>
>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>> Arnaldo Carvalho de Melo <[email protected]> wrote:
>>>
>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>> <[email protected]> wrote:
>>>>>> Again, this only allows attaching to previously created kprobes,
>> it does
>>>>>> not allow creating kprobes, right?
>>>>
>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>> kprobes.
>>>>
>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>> creating kprobes.
>>>>
>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>> interface to
>>>>> define new kprobe events, and those events are treated as
>> completely same as
>>>>> tracepoint events. On the other hand, ebpf tries to define new
>> probe event
>>>>> via perf_event interface. Above one is that interface. IOW, it
>> creates new kprobe.
>>>>
>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>> interface for creating kprobes/uprobes?
>>>
>>> Would you mean perf probe to switch to perf_event_open()?
>>> No, perf probe is for setting up the ftrace probe events. I think we
>> can add an
>>> option to use perf_event_open(). But current kprobe creation from
>> perf_event_open()
>>> is separated from ftrace by design.
>>
>> I guess we can extend event parser to understand kprobe directly.
>> Instead of
>>
>> perf probe kernel_func
>> perf stat/record -e probe:kernel_func ...
>>
>> We can just do
>>
>> perf stat/record -e kprobe:kernel_func ...
>
>
> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.

Arnaldo, Masami, Song,

What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
Could you please also review and comment on patch 5/9 for bpf_trace.c?

Thanks,
Alexey

>
> Regards,
>
> - Arnaldo
>
>>
>> Thanks,
>> Song
>


2020-01-13 20:41:00

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process



> On Jan 11, 2020, at 1:57 AM, Alexey Budankov <[email protected]> wrote:
>
>
> On 11.01.2020 3:35, [email protected] wrote:
>> <[email protected]>,Jann Horn <[email protected]>,Thomas Gleixner <[email protected]>,Tvrtko Ursulin <[email protected]>,Lionel Landwerlin <[email protected]>,linux-kernel <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>,"[email protected]" <[email protected]>
>> From: Arnaldo Carvalho de Melo <[email protected]>
>> Message-ID: <[email protected]>
>>
>> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
>>>
>>>
>>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
>>> wrote:
>>>>
>>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>>> Arnaldo Carvalho de Melo <[email protected]> wrote:
>>>>
>>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>>> <[email protected]> wrote:
>>>>>>> Again, this only allows attaching to previously created kprobes,
>>> it does
>>>>>>> not allow creating kprobes, right?
>>>>>
>>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>>> kprobes.
>>>>>
>>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>>> creating kprobes.
>>>>>
>>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>>> interface to
>>>>>> define new kprobe events, and those events are treated as
>>> completely same as
>>>>>> tracepoint events. On the other hand, ebpf tries to define new
>>> probe event
>>>>>> via perf_event interface. Above one is that interface. IOW, it
>>> creates new kprobe.
>>>>>
>>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>>> interface for creating kprobes/uprobes?
>>>>
>>>> Would you mean perf probe to switch to perf_event_open()?
>>>> No, perf probe is for setting up the ftrace probe events. I think we
>>> can add an
>>>> option to use perf_event_open(). But current kprobe creation from
>>> perf_event_open()
>>>> is separated from ftrace by design.
>>>
>>> I guess we can extend event parser to understand kprobe directly.
>>> Instead of
>>>
>>> perf probe kernel_func
>>> perf stat/record -e probe:kernel_func ...
>>>
>>> We can just do
>>>
>>> perf stat/record -e kprobe:kernel_func ...
>>
>>
>> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
>
> Arnaldo, Masami, Song,
>
> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?

I think we should at least allow CAP_SYS_PERFMON to create some kprobes. Maybe we can
limited that to per-task kprobes, and the task should be owned by the user?

Thanks,
Song


2020-01-14 03:47:22

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Sat, 11 Jan 2020 12:57:18 +0300
Alexey Budankov <[email protected]> wrote:

>
> On 11.01.2020 3:35, [email protected] wrote:

> > Message-ID: <[email protected]>
> >
> > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
> >>
> >>
> >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
> >> wrote:
> >>>
> >>> On Fri, 10 Jan 2020 13:45:31 -0300
> >>> Arnaldo Carvalho de Melo <[email protected]> wrote:
> >>>
> >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> >> <[email protected]> wrote:
> >>>>>> Again, this only allows attaching to previously created kprobes,
> >> it does
> >>>>>> not allow creating kprobes, right?
> >>>>
> >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> >>>>>> kprobes.
> >>>>
> >>>>>> As might be clear; I don't actually know what the user-ABI is for
> >>>>>> creating kprobes.
> >>>>
> >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> >> interface to
> >>>>> define new kprobe events, and those events are treated as
> >> completely same as
> >>>>> tracepoint events. On the other hand, ebpf tries to define new
> >> probe event
> >>>>> via perf_event interface. Above one is that interface. IOW, it
> >> creates new kprobe.
> >>>>
> >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> >>>> interface for creating kprobes/uprobes?
> >>>
> >>> Would you mean perf probe to switch to perf_event_open()?
> >>> No, perf probe is for setting up the ftrace probe events. I think we
> >> can add an
> >>> option to use perf_event_open(). But current kprobe creation from
> >> perf_event_open()
> >>> is separated from ftrace by design.
> >>
> >> I guess we can extend event parser to understand kprobe directly.
> >> Instead of
> >>
> >> perf probe kernel_func
> >> perf stat/record -e probe:kernel_func ...
> >>
> >> We can just do
> >>
> >> perf stat/record -e kprobe:kernel_func ...
> >
> >
> > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
>
> Arnaldo, Masami, Song,
>
> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> Could you please also review and comment on patch 5/9 for bpf_trace.c?

As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
to open it for enabling/disabling kprobes, not for creation.

If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
to shoot their foot by their own risk, I'm OK to allow it. (Even though,
it should check the max number of probes to be created by something like
ulimit)
I think nowadays we have fixed all such kernel crash problems on x86,
but not sure for other archs, especially on the devices I can not reach.
I need more help to stabilize it.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-01-14 05:19:10

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <[email protected]> wrote:
>
> On Sat, 11 Jan 2020 12:57:18 +0300
> Alexey Budankov <[email protected]> wrote:
>
> >
> > On 11.01.2020 3:35, [email protected] wrote:
>
> > > Message-ID: <[email protected]>
> > >
> > > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
> > >>
> > >>
> > >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
> > >> wrote:
> > >>>
> > >>> On Fri, 10 Jan 2020 13:45:31 -0300
> > >>> Arnaldo Carvalho de Melo <[email protected]> wrote:
> > >>>
> > >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> > >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> > >> <[email protected]> wrote:
> > >>>>>> Again, this only allows attaching to previously created kprobes,
> > >> it does
> > >>>>>> not allow creating kprobes, right?
> > >>>>
> > >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> > >>>>>> kprobes.
> > >>>>
> > >>>>>> As might be clear; I don't actually know what the user-ABI is for
> > >>>>>> creating kprobes.
> > >>>>
> > >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> > >> interface to
> > >>>>> define new kprobe events, and those events are treated as
> > >> completely same as
> > >>>>> tracepoint events. On the other hand, ebpf tries to define new
> > >> probe event
> > >>>>> via perf_event interface. Above one is that interface. IOW, it
> > >> creates new kprobe.
> > >>>>
> > >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> > >>>> interface for creating kprobes/uprobes?
> > >>>
> > >>> Would you mean perf probe to switch to perf_event_open()?
> > >>> No, perf probe is for setting up the ftrace probe events. I think we
> > >> can add an
> > >>> option to use perf_event_open(). But current kprobe creation from
> > >> perf_event_open()
> > >>> is separated from ftrace by design.
> > >>
> > >> I guess we can extend event parser to understand kprobe directly.
> > >> Instead of
> > >>
> > >> perf probe kernel_func
> > >> perf stat/record -e probe:kernel_func ...
> > >>
> > >> We can just do
> > >>
> > >> perf stat/record -e kprobe:kernel_func ...
> > >
> > >
> > > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> >
> > Arnaldo, Masami, Song,
> >
> > What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> > Could you please also review and comment on patch 5/9 for bpf_trace.c?
>
> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> to open it for enabling/disabling kprobes, not for creation.
>
> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> it should check the max number of probes to be created by something like
> ulimit)
> I think nowadays we have fixed all such kernel crash problems on x86,
> but not sure for other archs, especially on the devices I can not reach.
> I need more help to stabilize it.

I don't see how enable/disable is any safer than creation.
If there are kernel bugs in kprobes the kernel will crash anyway.
I think such partial CAP_SYS_PERFMON would be very confusing to the users.
CAP_* is about delegation of root privileges to non-root.
Delegating some of it is ok, but disallowing creation makes it useless
for bpf tracing, so we would need to add another CAP later.
Hence I suggest to do it right away instead of breaking
sys_perf_even_open() access into two CAPs.

2020-01-14 09:49:10

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 14.01.2020 8:17, Alexei Starovoitov wrote:
> On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <[email protected]> wrote:
>>
>> On Sat, 11 Jan 2020 12:57:18 +0300
>> Alexey Budankov <[email protected]> wrote:
>>
>>>
>>> On 11.01.2020 3:35, [email protected] wrote:
>>
>>>> Message-ID: <[email protected]>
>>>>
>>>> On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
>>>>>
>>>>>
>>>>>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
>>>>> wrote:
>>>>>>
>>>>>> On Fri, 10 Jan 2020 13:45:31 -0300
>>>>>> Arnaldo Carvalho de Melo <[email protected]> wrote:
>>>>>>
>>>>>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
>>>>>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
>>>>> <[email protected]> wrote:
>>>>>>>>> Again, this only allows attaching to previously created kprobes,
>>>>> it does
>>>>>>>>> not allow creating kprobes, right?
>>>>>>>
>>>>>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
>>>>>>>>> kprobes.
>>>>>>>
>>>>>>>>> As might be clear; I don't actually know what the user-ABI is for
>>>>>>>>> creating kprobes.
>>>>>>>
>>>>>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
>>>>> interface to
>>>>>>>> define new kprobe events, and those events are treated as
>>>>> completely same as
>>>>>>>> tracepoint events. On the other hand, ebpf tries to define new
>>>>> probe event
>>>>>>>> via perf_event interface. Above one is that interface. IOW, it
>>>>> creates new kprobe.
>>>>>>>
>>>>>>> Masami, any plans to make 'perf probe' use the perf_event_open()
>>>>>>> interface for creating kprobes/uprobes?
>>>>>>
>>>>>> Would you mean perf probe to switch to perf_event_open()?
>>>>>> No, perf probe is for setting up the ftrace probe events. I think we
>>>>> can add an
>>>>>> option to use perf_event_open(). But current kprobe creation from
>>>>> perf_event_open()
>>>>>> is separated from ftrace by design.
>>>>>
>>>>> I guess we can extend event parser to understand kprobe directly.
>>>>> Instead of
>>>>>
>>>>> perf probe kernel_func
>>>>> perf stat/record -e probe:kernel_func ...
>>>>>
>>>>> We can just do
>>>>>
>>>>> perf stat/record -e kprobe:kernel_func ...
>>>>
>>>>
>>>> You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
>>>
>>> Arnaldo, Masami, Song,
>>>
>>> What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
>>> Could you please also review and comment on patch 5/9 for bpf_trace.c?
>>
>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>> to open it for enabling/disabling kprobes, not for creation.
>>
>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>> it should check the max number of probes to be created by something like
>> ulimit)
>> I think nowadays we have fixed all such kernel crash problems on x86,
>> but not sure for other archs, especially on the devices I can not reach.
>> I need more help to stabilize it.
>
> I don't see how enable/disable is any safer than creation.
> If there are kernel bugs in kprobes the kernel will crash anyway.
> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> CAP_* is about delegation of root privileges to non-root.
> Delegating some of it is ok, but disallowing creation makes it useless
> for bpf tracing, so we would need to add another CAP later.
> Hence I suggest to do it right away instead of breaking
> sys_perf_even_open() access into two CAPs.
>

Alexei, Masami,

Thanks for your meaningful input.
If we know in advance that it still can crash the system in some cases and on
some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
such delegation looks premature until the crashes are avoided. So it looks like
access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
a separate patch set.

Thanks,
Alexey

2020-01-14 12:07:48

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Mon, 13 Jan 2020 21:17:49 -0800
Alexei Starovoitov <[email protected]> wrote:

> On Mon, Jan 13, 2020 at 7:25 PM Masami Hiramatsu <[email protected]> wrote:
> >
> > On Sat, 11 Jan 2020 12:57:18 +0300
> > Alexey Budankov <[email protected]> wrote:
> >
> > >
> > > On 11.01.2020 3:35, [email protected] wrote:
> >
> > > > Message-ID: <[email protected]>
> > > >
> > > > On January 10, 2020 9:23:27 PM GMT-03:00, Song Liu <[email protected]> wrote:
> > > >>
> > > >>
> > > >>> On Jan 10, 2020, at 3:47 PM, Masami Hiramatsu <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>> On Fri, 10 Jan 2020 13:45:31 -0300
> > > >>> Arnaldo Carvalho de Melo <[email protected]> wrote:
> > > >>>
> > > >>>> Em Sat, Jan 11, 2020 at 12:52:13AM +0900, Masami Hiramatsu escreveu:
> > > >>>>> On Fri, 10 Jan 2020 15:02:34 +0100 Peter Zijlstra
> > > >> <[email protected]> wrote:
> > > >>>>>> Again, this only allows attaching to previously created kprobes,
> > > >> it does
> > > >>>>>> not allow creating kprobes, right?
> > > >>>>
> > > >>>>>> That is; I don't think CAP_SYS_PERFMON should be allowed to create
> > > >>>>>> kprobes.
> > > >>>>
> > > >>>>>> As might be clear; I don't actually know what the user-ABI is for
> > > >>>>>> creating kprobes.
> > > >>>>
> > > >>>>> There are 2 ABIs nowadays, ftrace and ebpf. perf-probe uses ftrace
> > > >> interface to
> > > >>>>> define new kprobe events, and those events are treated as
> > > >> completely same as
> > > >>>>> tracepoint events. On the other hand, ebpf tries to define new
> > > >> probe event
> > > >>>>> via perf_event interface. Above one is that interface. IOW, it
> > > >> creates new kprobe.
> > > >>>>
> > > >>>> Masami, any plans to make 'perf probe' use the perf_event_open()
> > > >>>> interface for creating kprobes/uprobes?
> > > >>>
> > > >>> Would you mean perf probe to switch to perf_event_open()?
> > > >>> No, perf probe is for setting up the ftrace probe events. I think we
> > > >> can add an
> > > >>> option to use perf_event_open(). But current kprobe creation from
> > > >> perf_event_open()
> > > >>> is separated from ftrace by design.
> > > >>
> > > >> I guess we can extend event parser to understand kprobe directly.
> > > >> Instead of
> > > >>
> > > >> perf probe kernel_func
> > > >> perf stat/record -e probe:kernel_func ...
> > > >>
> > > >> We can just do
> > > >>
> > > >> perf stat/record -e kprobe:kernel_func ...
> > > >
> > > >
> > > > You took the words from my mouth, exactly, that is a perfect use case, an alternative to the 'perf probe' one of making a disabled event that then gets activated via record/stat/trace, in many cases it's better, removes the explicit probe setup case.
> > >
> > > Arnaldo, Masami, Song,
> > >
> > > What do you think about making this also open to CAP_SYS_PERFMON privileged processes?
> > > Could you please also review and comment on patch 5/9 for bpf_trace.c?
> >
> > As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> > to open it for enabling/disabling kprobes, not for creation.
> >
> > If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> > to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> > it should check the max number of probes to be created by something like
> > ulimit)
> > I think nowadays we have fixed all such kernel crash problems on x86,
> > but not sure for other archs, especially on the devices I can not reach.
> > I need more help to stabilize it.
>
> I don't see how enable/disable is any safer than creation.
> If there are kernel bugs in kprobes the kernel will crash anyway.

Why? admin can test the probes before using it via bpf.

My point was only admin can make a dicision to allow (or delegate) the
priviledge to a user, and if it is OK, I don't mind it.
(Maybe it is better to give a knob to allow this CAP only for admin.)

> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> CAP_* is about delegation of root privileges to non-root.
> Delegating some of it is ok, but disallowing creation makes it useless
> for bpf tracing, so we would need to add another CAP later.
> Hence I suggest to do it right away instead of breaking
> sys_perf_even_open() access into two CAPs.

I understand that the single strong CAP will useful anyway (even if
it is CAP_SYS_ADMIN). I just concern that causes any issue and when
someone wants to mitigate it, it is sad if there is only way to disable
all tracing facilities.

What about providing a sysctl to control the power of the CAP? maybe
it is also good from the viewpoint of system security.

Thank you,

--
Masami Hiramatsu <[email protected]>

2020-01-14 18:08:10

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
<[email protected]> wrote:
> >>
> >> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >> to open it for enabling/disabling kprobes, not for creation.
> >>
> >> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >> it should check the max number of probes to be created by something like
> >> ulimit)
> >> I think nowadays we have fixed all such kernel crash problems on x86,
> >> but not sure for other archs, especially on the devices I can not reach.
> >> I need more help to stabilize it.
> >
> > I don't see how enable/disable is any safer than creation.
> > If there are kernel bugs in kprobes the kernel will crash anyway.
> > I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> > CAP_* is about delegation of root privileges to non-root.
> > Delegating some of it is ok, but disallowing creation makes it useless
> > for bpf tracing, so we would need to add another CAP later.
> > Hence I suggest to do it right away instead of breaking
> > sys_perf_even_open() access into two CAPs.
> >
>
> Alexei, Masami,
>
> Thanks for your meaningful input.
> If we know in advance that it still can crash the system in some cases and on
> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> such delegation looks premature until the crashes are avoided. So it looks like
> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> a separate patch set.

perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
imo adding a cap just for pmc is pointless.
if you add a new cap it should cover all of sys_perf_event_open syscall.
subdividing it into sw vs hw counters, kprobe create vs enable, etc will
be the source of ongoing confusion. nack to such cap.

2020-01-14 18:51:50

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 14.01.2020 21:06, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> <[email protected]> wrote:
>>>>
>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>
>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>> it should check the max number of probes to be created by something like
>>>> ulimit)
>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>> but not sure for other archs, especially on the devices I can not reach.
>>>> I need more help to stabilize it.
>>>
>>> I don't see how enable/disable is any safer than creation.
>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>> CAP_* is about delegation of root privileges to non-root.
>>> Delegating some of it is ok, but disallowing creation makes it useless
>>> for bpf tracing, so we would need to add another CAP later.
>>> Hence I suggest to do it right away instead of breaking
>>> sys_perf_even_open() access into two CAPs.
>>>
>>
>> Alexei, Masami,
>>
>> Thanks for your meaningful input.
>> If we know in advance that it still can crash the system in some cases and on
>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>> such delegation looks premature until the crashes are avoided. So it looks like
>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>> a separate patch set.
>
> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.

Sure, software cannot guarantee, but known software bugs could still be fixed,
that's what I meant.

> imo adding a cap just for pmc is pointless.
> if you add a new cap it should cover all of sys_perf_event_open syscall.
> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> be the source of ongoing confusion. nack to such cap.
>

Well, as this patch set already covers complete perf_event_open functionality,
and also eBPF related parts too, could you please review and comment on it?
Does the patches 2/9 and 5/9 already bring all required extentions?

Thanks,
Alexey

2020-01-15 01:53:57

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
<[email protected]> wrote:
>
>
> On 14.01.2020 21:06, Alexei Starovoitov wrote:
> > On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> > <[email protected]> wrote:
> >>>>
> >>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >>>> to open it for enabling/disabling kprobes, not for creation.
> >>>>
> >>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >>>> it should check the max number of probes to be created by something like
> >>>> ulimit)
> >>>> I think nowadays we have fixed all such kernel crash problems on x86,
> >>>> but not sure for other archs, especially on the devices I can not reach.
> >>>> I need more help to stabilize it.
> >>>
> >>> I don't see how enable/disable is any safer than creation.
> >>> If there are kernel bugs in kprobes the kernel will crash anyway.
> >>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> >>> CAP_* is about delegation of root privileges to non-root.
> >>> Delegating some of it is ok, but disallowing creation makes it useless
> >>> for bpf tracing, so we would need to add another CAP later.
> >>> Hence I suggest to do it right away instead of breaking
> >>> sys_perf_even_open() access into two CAPs.
> >>>
> >>
> >> Alexei, Masami,
> >>
> >> Thanks for your meaningful input.
> >> If we know in advance that it still can crash the system in some cases and on
> >> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> >> such delegation looks premature until the crashes are avoided. So it looks like
> >> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> >> a separate patch set.
> >
> > perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>
> Sure, software cannot guarantee, but known software bugs could still be fixed,
> that's what I meant.
>
> > imo adding a cap just for pmc is pointless.
> > if you add a new cap it should cover all of sys_perf_event_open syscall.
> > subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> > be the source of ongoing confusion. nack to such cap.
> >
>
> Well, as this patch set already covers complete perf_event_open functionality,
> and also eBPF related parts too, could you please review and comment on it?
> Does the patches 2/9 and 5/9 already bring all required extentions?

yes. the current patches 2 and 5 look good to me.
I would only change patch 1 to what Andy was proposing earlier:

static inline bool perfmon_capable(void)
{
if (capable_noaudit(CAP_PERFMON))
return capable(CAP_PERFMON);
if (capable_noaudit(CAP_SYS_ADMIN))
return capable(CAP_SYS_ADMIN);

return capable(CAP_PERFMON);
}
I think Andy was trying to preserve the order of audit events.

I'm also suggesting to drop SYS from the cap name. It doesn't add any value
to the name.

2020-01-15 05:17:21

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 15.01.2020 4:52, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
> <[email protected]> wrote:
>>
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <[email protected]> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
>>
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
>
> yes. the current patches 2 and 5 look good to me.

Thanks. I appreciate your cooperation.

> I would only change patch 1 to what Andy was proposing earlier:

Could you please share the link to the proposal to get more details?
In this patch set discussion there was only this [1] on more generic
naming of PERFMON cap from Andi Kleen.

>
> static inline bool perfmon_capable(void)
> {
> if (capable_noaudit(CAP_PERFMON))
> return capable(CAP_PERFMON);
> if (capable_noaudit(CAP_SYS_ADMIN))
> return capable(CAP_SYS_ADMIN);
>
> return capable(CAP_PERFMON);
> }

Yes, this makes sense and adds up.

> I think Andy was trying to preserve the order of audit events.
>
> I'm also suggesting to drop SYS from the cap name. It doesn't add any value
> to the name.

Agreed, CAP_PERFMON sounds more generic, as it actually is.

Gratefully,
Alexey

[1] https://lore.kernel.org/lkml/[email protected]/

2020-01-15 09:47:08

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

On Tue, 14 Jan 2020 21:50:33 +0300
Alexey Budankov <[email protected]> wrote:

>
> On 14.01.2020 21:06, Alexei Starovoitov wrote:
> > On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
> > <[email protected]> wrote:
> >>>>
> >>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
> >>>> to open it for enabling/disabling kprobes, not for creation.
> >>>>
> >>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
> >>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
> >>>> it should check the max number of probes to be created by something like
> >>>> ulimit)
> >>>> I think nowadays we have fixed all such kernel crash problems on x86,
> >>>> but not sure for other archs, especially on the devices I can not reach.
> >>>> I need more help to stabilize it.
> >>>
> >>> I don't see how enable/disable is any safer than creation.
> >>> If there are kernel bugs in kprobes the kernel will crash anyway.
> >>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
> >>> CAP_* is about delegation of root privileges to non-root.
> >>> Delegating some of it is ok, but disallowing creation makes it useless
> >>> for bpf tracing, so we would need to add another CAP later.
> >>> Hence I suggest to do it right away instead of breaking
> >>> sys_perf_even_open() access into two CAPs.
> >>>
> >>
> >> Alexei, Masami,
> >>
> >> Thanks for your meaningful input.
> >> If we know in advance that it still can crash the system in some cases and on
> >> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
> >> such delegation looks premature until the crashes are avoided. So it looks like
> >> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
> >> a separate patch set.
> >
> > perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>

OK, anyway, for higher security, admin may not give CAP_SYS_PERFMON to
unpriviledged users, since it might allows users to analyze kernel, which
can lead security concerns.

> Sure, software cannot guarantee, but known software bugs could still be fixed,
> that's what I meant.

Agreed, bugs must be fixed anyway.

Thank you,

> > imo adding a cap just for pmc is pointless.
> > if you add a new cap it should cover all of sys_perf_event_open syscall.
> > subdividing it into sw vs hw counters, kprobe create vs enable, etc will
> > be the source of ongoing confusion. nack to such cap.
> >
>
> Well, as this patch set already covers complete perf_event_open functionality,
> and also eBPF related parts too, could you please review and comment on it?
> Does the patches 2/9 and 5/9 already bring all required extentions?
>
> Thanks,
> Alexey


--
Masami Hiramatsu <[email protected]>

2020-01-15 12:12:33

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 15.01.2020 12:45, Masami Hiramatsu wrote:
> On Tue, 14 Jan 2020 21:50:33 +0300
> Alexey Budankov <[email protected]> wrote:
>
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <[email protected]> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
>
> OK, anyway, for higher security, admin may not give CAP_SYS_PERFMON to
> unpriviledged users, since it might allows users to analyze kernel, which
> can lead security concerns.

FWIW,
Discovered security related hardware issues could be mitigated in software and
here [1] is the official procedure documented on how to follow up, so this could
be a draft plan to approach eBPF perf_events related hardware issues, if required.

[1] https://www.kernel.org/doc/html/latest/process/embargoed-hardware-issues.html

>
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
>
> Agreed, bugs must be fixed anyway.
>
> Thank you,
>
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
>>
>> Thanks,
>> Alexey
>
>

2020-04-01 20:53:03

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process

Hi Alexei,

On 15.01.2020 4:52, Alexei Starovoitov wrote:
> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
> <[email protected]> wrote:
>>
>>
>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>> <[email protected]> wrote:
>>>>>>
>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>
>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>> it should check the max number of probes to be created by something like
>>>>>> ulimit)
>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>> I need more help to stabilize it.
>>>>>
>>>>> I don't see how enable/disable is any safer than creation.
>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>> Hence I suggest to do it right away instead of breaking
>>>>> sys_perf_even_open() access into two CAPs.
>>>>>
>>>>
>>>> Alexei, Masami,
>>>>
>>>> Thanks for your meaningful input.
>>>> If we know in advance that it still can crash the system in some cases and on
>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>> a separate patch set.
>>>
>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>
>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>> that's what I meant.
>>
>>> imo adding a cap just for pmc is pointless.
>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>> be the source of ongoing confusion. nack to such cap.
>>>
>>
>> Well, as this patch set already covers complete perf_event_open functionality,
>> and also eBPF related parts too, could you please review and comment on it?
>> Does the patches 2/9 and 5/9 already bring all required extentions?
>
> yes. the current patches 2 and 5 look good to me.

Could this have you explicit Reviewed-by or Acked-by tag so
the changes could be driven into the kernel?
Latest v7 is here: https://lore.kernel.org/lkml/[email protected]/

Thanks,
Alexey

2020-04-03 13:57:45

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 01.04.2020 23:50, Alexey Budankov wrote:
> Hi Alexei,
>
> On 15.01.2020 4:52, Alexei Starovoitov wrote:
>> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
>> <[email protected]> wrote:
>>>
>>>
>>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>>
>>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>>> it should check the max number of probes to be created by something like
>>>>>>> ulimit)
>>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>>> I need more help to stabilize it.
>>>>>>
>>>>>> I don't see how enable/disable is any safer than creation.
>>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>>> Hence I suggest to do it right away instead of breaking
>>>>>> sys_perf_even_open() access into two CAPs.
>>>>>>
>>>>>
>>>>> Alexei, Masami,
>>>>>
>>>>> Thanks for your meaningful input.
>>>>> If we know in advance that it still can crash the system in some cases and on
>>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>>> a separate patch set.
>>>>
>>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>>
>>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>>> that's what I meant.
>>>
>>>> imo adding a cap just for pmc is pointless.
>>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>>> be the source of ongoing confusion. nack to such cap.
>>>>
>>>
>>> Well, as this patch set already covers complete perf_event_open functionality,
>>> and also eBPF related parts too, could you please review and comment on it?
>>> Does the patches 2/9 and 5/9 already bring all required extentions?
>>
>> yes. the current patches 2 and 5 look good to me.
>
> Could this have you explicit Reviewed-by or Acked-by tag so
> the changes could be driven into the kernel?
> Latest v7 is here: https://lore.kernel.org/lkml/[email protected]/

Posted v8 with all acquired tags so far:
https://lore.kernel.org/lkml/[email protected]/

Thanks,
Alexey

2020-04-03 13:57:45

by Alexey Budankov

[permalink] [raw]
Subject: Re: [PATCH v4 2/9] perf/core: open access for CAP_SYS_PERFMON privileged process


On 01.04.2020 23:50, Alexey Budankov wrote:
> Hi Alexei,
>
> On 15.01.2020 4:52, Alexei Starovoitov wrote:
>> On Tue, Jan 14, 2020 at 10:50 AM Alexey Budankov
>> <[email protected]> wrote:
>>>
>>>
>>> On 14.01.2020 21:06, Alexei Starovoitov wrote:
>>>> On Tue, Jan 14, 2020 at 1:47 AM Alexey Budankov
>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> As we talked at RFC series of CAP_SYS_TRACING last year, I just expected
>>>>>>> to open it for enabling/disabling kprobes, not for creation.
>>>>>>>
>>>>>>> If we can accept user who has no admin priviledge but the CAP_SYS_PERFMON,
>>>>>>> to shoot their foot by their own risk, I'm OK to allow it. (Even though,
>>>>>>> it should check the max number of probes to be created by something like
>>>>>>> ulimit)
>>>>>>> I think nowadays we have fixed all such kernel crash problems on x86,
>>>>>>> but not sure for other archs, especially on the devices I can not reach.
>>>>>>> I need more help to stabilize it.
>>>>>>
>>>>>> I don't see how enable/disable is any safer than creation.
>>>>>> If there are kernel bugs in kprobes the kernel will crash anyway.
>>>>>> I think such partial CAP_SYS_PERFMON would be very confusing to the users.
>>>>>> CAP_* is about delegation of root privileges to non-root.
>>>>>> Delegating some of it is ok, but disallowing creation makes it useless
>>>>>> for bpf tracing, so we would need to add another CAP later.
>>>>>> Hence I suggest to do it right away instead of breaking
>>>>>> sys_perf_even_open() access into two CAPs.
>>>>>>
>>>>>
>>>>> Alexei, Masami,
>>>>>
>>>>> Thanks for your meaningful input.
>>>>> If we know in advance that it still can crash the system in some cases and on
>>>>> some archs, even though root fully controls delegation thru CAP_SYS_PERFMON,
>>>>> such delegation looks premature until the crashes are avoided. So it looks like
>>>>> access to eBPF for CAP_SYS_PERFMON privileged processes is the subject for
>>>>> a separate patch set.
>>>>
>>>> perf_event_open is always dangerous. sw cannot guarantee non-bugginess of hw.
>>>
>>> Sure, software cannot guarantee, but known software bugs could still be fixed,
>>> that's what I meant.
>>>
>>>> imo adding a cap just for pmc is pointless.
>>>> if you add a new cap it should cover all of sys_perf_event_open syscall.
>>>> subdividing it into sw vs hw counters, kprobe create vs enable, etc will
>>>> be the source of ongoing confusion. nack to such cap.
>>>>
>>>
>>> Well, as this patch set already covers complete perf_event_open functionality,
>>> and also eBPF related parts too, could you please review and comment on it?
>>> Does the patches 2/9 and 5/9 already bring all required extentions?
>>
>> yes. the current patches 2 and 5 look good to me.
>
> Could this have you explicit Reviewed-by or Acked-by tag so
> the changes could be driven into the kernel?
> Latest v7 is here: https://lore.kernel.org/lkml/[email protected]/

Posted v8 with all acquired tags so far:
https://lore.kernel.org/lkml/[email protected]/

Thanks,
Alexey