2021-02-23 14:51:40

by Marco Elver

[permalink] [raw]
Subject: [PATCH RFC 0/4] Add support for synchronous signals on perf events

The perf subsystem today unifies various tracing and monitoring
features, from both software and hardware. One benefit of the perf
subsystem is automatically inheriting events to child tasks, which
enables process-wide events monitoring with low overheads. By default
perf events are non-intrusive, not affecting behaviour of the tasks
being monitored.

For certain use-cases, however, it makes sense to leverage the
generality of the perf events subsystem and optionally allow the tasks
being monitored to receive signals on events they are interested in.
This patch series adds the option to synchronously signal user space on
events.

The discussion at [1] led to the changes proposed in this series. The
approach taken in patch 3/4 to use 'event_limit' to trigger the signal
was kindly suggested by Peter Zijlstra in [2].

[1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZD=-XQ@mail.gmail.com/
[2] https://lore.kernel.org/lkml/[email protected]/

Motivation and example uses:

1. Our immediate motivation is low-overhead sampling-based race
detection for user-space [3]. By using perf_event_open() at
process initialization, we can create hardware
breakpoint/watchpoint events that are propagated automatically
to all threads in a process. As far as we are aware, today no
existing kernel facility (such as ptrace) allows us to set up
process-wide watchpoints with minimal overheads (that are
comparable to mprotect() of whole pages).

[3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf

2. Other low-overhead error detectors that rely on detecting
accesses to certain memory locations or code, process-wide and
also only in a specific set of subtasks or threads.

Other example use-cases we found potentially interesting:

3. Code hot patching without full stop-the-world. Specifically, by
setting a code breakpoint to entry to the patched routine, then
send signals to threads and check that they are not in the
routine, but without stopping them further. If any of the
threads will enter the routine, it will receive SIGTRAP and
pause.

4. Safepoints without mprotect(). Some Java implementations use
"load from a known memory location" as a safepoint. When threads
need to be stopped, the page containing the location is
mprotect()ed and threads get a signal. This can be replaced with
a watchpoint, which does not require a whole page nor DTLB
shootdowns.

5. Tracking data flow globally.

6. Threads receiving signals on performance events to
throttle/unthrottle themselves.


Marco Elver (4):
perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
signal: Introduce TRAP_PERF si_code and si_perf to siginfo
perf/core: Add support for SIGTRAP on perf events
perf/core: Add breakpoint information to siginfo on SIGTRAP

arch/m68k/kernel/signal.c | 3 ++
arch/x86/kernel/signal_compat.c | 5 ++-
fs/signalfd.c | 4 +++
include/linux/compat.h | 2 ++
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +++-
include/uapi/linux/perf_event.h | 3 +-
include/uapi/linux/signalfd.h | 4 ++-
kernel/events/core.c | 54 +++++++++++++++++++++++++++++-
kernel/signal.c | 11 ++++++
10 files changed, 88 insertions(+), 5 deletions(-)

--
2.30.0.617.g56c4b15f3c-goog


2021-02-23 14:52:47

by Marco Elver

[permalink] [raw]
Subject: [PATCH RFC 2/4] signal: Introduce TRAP_PERF si_code and si_perf to siginfo

Introduces the TRAP_PERF si_code, and associated siginfo_t field
si_perf. These will be used by the perf event subsystem to send signals
(if requested) to the task where an event occurred.

Signed-off-by: Marco Elver <[email protected]>
---
arch/m68k/kernel/signal.c | 3 +++
arch/x86/kernel/signal_compat.c | 5 ++++-
fs/signalfd.c | 4 ++++
include/linux/compat.h | 2 ++
include/linux/signal.h | 1 +
include/uapi/asm-generic/siginfo.h | 6 +++++-
include/uapi/linux/signalfd.h | 4 +++-
kernel/signal.c | 11 +++++++++++
8 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index 349570f16a78..a4b7ee1df211 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -622,6 +622,9 @@ static inline void siginfo_build_tests(void)
/* _sigfault._addr_pkey */
BUILD_BUG_ON(offsetof(siginfo_t, si_pkey) != 0x12);

+ /* _sigfault._perf */
+ BUILD_BUG_ON(offsetof(siginfo_t, si_perf) != 0x10);
+
/* _sigpoll */
BUILD_BUG_ON(offsetof(siginfo_t, si_band) != 0x0c);
BUILD_BUG_ON(offsetof(siginfo_t, si_fd) != 0x10);
diff --git a/arch/x86/kernel/signal_compat.c b/arch/x86/kernel/signal_compat.c
index a5330ff498f0..0e5d0a7e203b 100644
--- a/arch/x86/kernel/signal_compat.c
+++ b/arch/x86/kernel/signal_compat.c
@@ -29,7 +29,7 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(NSIGFPE != 15);
BUILD_BUG_ON(NSIGSEGV != 9);
BUILD_BUG_ON(NSIGBUS != 5);
- BUILD_BUG_ON(NSIGTRAP != 5);
+ BUILD_BUG_ON(NSIGTRAP != 6);
BUILD_BUG_ON(NSIGCHLD != 6);
BUILD_BUG_ON(NSIGSYS != 2);

@@ -138,6 +138,9 @@ static inline void signal_compat_build_tests(void)
BUILD_BUG_ON(offsetof(siginfo_t, si_pkey) != 0x20);
BUILD_BUG_ON(offsetof(compat_siginfo_t, si_pkey) != 0x14);

+ BUILD_BUG_ON(offsetof(siginfo_t, si_perf) != 0x18);
+ BUILD_BUG_ON(offsetof(compat_siginfo_t, si_perf) != 0x10);
+
CHECK_CSI_OFFSET(_sigpoll);
CHECK_CSI_SIZE (_sigpoll, 2*sizeof(int));
CHECK_SI_SIZE (_sigpoll, 4*sizeof(int));
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 456046e15873..040a1142915f 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -134,6 +134,10 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
#endif
new.ssi_addr_lsb = (short) kinfo->si_addr_lsb;
break;
+ case SIL_PERF_EVENT:
+ new.ssi_addr = (long) kinfo->si_addr;
+ new.ssi_perf = kinfo->si_perf;
+ break;
case SIL_CHLD:
new.ssi_pid = kinfo->si_pid;
new.ssi_uid = kinfo->si_uid;
diff --git a/include/linux/compat.h b/include/linux/compat.h
index 6e65be753603..c8821d966812 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -236,6 +236,8 @@ typedef struct compat_siginfo {
char _dummy_pkey[__COMPAT_ADDR_BND_PKEY_PAD];
u32 _pkey;
} _addr_pkey;
+ /* used when si_code=TRAP_PERF */
+ compat_u64 _perf;
};
} _sigfault;

diff --git a/include/linux/signal.h b/include/linux/signal.h
index 205526c4003a..1e98548d7cf6 100644
--- a/include/linux/signal.h
+++ b/include/linux/signal.h
@@ -43,6 +43,7 @@ enum siginfo_layout {
SIL_FAULT_MCEERR,
SIL_FAULT_BNDERR,
SIL_FAULT_PKUERR,
+ SIL_PERF_EVENT,
SIL_CHLD,
SIL_RT,
SIL_SYS,
diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
index d2597000407a..d0bb9125c853 100644
--- a/include/uapi/asm-generic/siginfo.h
+++ b/include/uapi/asm-generic/siginfo.h
@@ -91,6 +91,8 @@ union __sifields {
char _dummy_pkey[__ADDR_BND_PKEY_PAD];
__u32 _pkey;
} _addr_pkey;
+ /* used when si_code=TRAP_PERF */
+ __u64 _perf;
};
} _sigfault;

@@ -155,6 +157,7 @@ typedef struct siginfo {
#define si_lower _sifields._sigfault._addr_bnd._lower
#define si_upper _sifields._sigfault._addr_bnd._upper
#define si_pkey _sifields._sigfault._addr_pkey._pkey
+#define si_perf _sifields._sigfault._perf
#define si_band _sifields._sigpoll._band
#define si_fd _sifields._sigpoll._fd
#define si_call_addr _sifields._sigsys._call_addr
@@ -253,7 +256,8 @@ typedef struct siginfo {
#define TRAP_BRANCH 3 /* process taken branch trap */
#define TRAP_HWBKPT 4 /* hardware breakpoint/watchpoint */
#define TRAP_UNK 5 /* undiagnosed trap */
-#define NSIGTRAP 5
+#define TRAP_PERF 6 /* perf event with sigtrap=1 */
+#define NSIGTRAP 6

/*
* There is an additional set of SIGTRAP si_codes used by ptrace
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index 83429a05b698..7e333042c7e3 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -39,6 +39,8 @@ struct signalfd_siginfo {
__s32 ssi_syscall;
__u64 ssi_call_addr;
__u32 ssi_arch;
+ __u32 __pad3;
+ __u64 ssi_perf;

/*
* Pad strcture to 128 bytes. Remember to update the
@@ -49,7 +51,7 @@ struct signalfd_siginfo {
* comes out of a read(2) and we really don't want to have
* a compat on read(2).
*/
- __u8 __pad[28];
+ __u8 __pad[16];
};


diff --git a/kernel/signal.c b/kernel/signal.c
index 5ad8566534e7..943c98782634 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1199,6 +1199,7 @@ static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
case SIL_FAULT_MCEERR:
case SIL_FAULT_BNDERR:
case SIL_FAULT_PKUERR:
+ case SIL_PERF_EVENT:
case SIL_SYS:
ret = false;
break;
@@ -2531,6 +2532,7 @@ static void hide_si_addr_tag_bits(struct ksignal *ksig)
case SIL_FAULT_MCEERR:
case SIL_FAULT_BNDERR:
case SIL_FAULT_PKUERR:
+ case SIL_PERF_EVENT:
ksig->info.si_addr = arch_untagged_si_addr(
ksig->info.si_addr, ksig->sig, ksig->info.si_code);
break;
@@ -3333,6 +3335,10 @@ void copy_siginfo_to_external32(struct compat_siginfo *to,
#endif
to->si_pkey = from->si_pkey;
break;
+ case SIL_PERF_EVENT:
+ to->si_addr = ptr_to_compat(from->si_addr);
+ to->si_perf = from->si_perf;
+ break;
case SIL_CHLD:
to->si_pid = from->si_pid;
to->si_uid = from->si_uid;
@@ -3413,6 +3419,10 @@ static int post_copy_siginfo_from_user32(kernel_siginfo_t *to,
#endif
to->si_pkey = from->si_pkey;
break;
+ case SIL_PERF_EVENT:
+ to->si_addr = compat_ptr(from->si_addr);
+ to->si_perf = from->si_perf;
+ break;
case SIL_CHLD:
to->si_pid = from->si_pid;
to->si_uid = from->si_uid;
@@ -4593,6 +4603,7 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_lower);
CHECK_OFFSET(si_upper);
CHECK_OFFSET(si_pkey);
+ CHECK_OFFSET(si_perf);

/* sigpoll */
CHECK_OFFSET(si_band);
--
2.30.0.617.g56c4b15f3c-goog

2021-02-23 14:55:15

by Marco Elver

[permalink] [raw]
Subject: [PATCH RFC 4/4] perf/core: Add breakpoint information to siginfo on SIGTRAP

Encode information from breakpoint attributes into siginfo_t, which
helps disambiguate which breakpoint fired.

Note, providing the event fd may be unreliable, since the event may have
been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
triggering and the signal being delivered to user space.

Signed-off-by: Marco Elver <[email protected]>
---
kernel/events/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8718763045fd..d7908322d796 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6296,6 +6296,17 @@ static void perf_sigtrap(struct perf_event *event)
info.si_signo = SIGTRAP;
info.si_code = TRAP_PERF;
info.si_errno = event->attr.type;
+
+ switch (event->attr.type) {
+ case PERF_TYPE_BREAKPOINT:
+ info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
+ info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
+ break;
+ default:
+ /* No additional info set. */
+ break;
+ }
+
force_sig_info(&info);
}

--
2.30.0.617.g56c4b15f3c-goog

2021-02-23 15:07:41

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] perf/core: Add breakpoint information to siginfo on SIGTRAP

On Tue, Feb 23, 2021 at 3:34 PM Marco Elver <[email protected]> wrote:
>
> Encode information from breakpoint attributes into siginfo_t, which
> helps disambiguate which breakpoint fired.
>
> Note, providing the event fd may be unreliable, since the event may have
> been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> triggering and the signal being delivered to user space.
>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> kernel/events/core.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8718763045fd..d7908322d796 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6296,6 +6296,17 @@ static void perf_sigtrap(struct perf_event *event)
> info.si_signo = SIGTRAP;
> info.si_code = TRAP_PERF;
> info.si_errno = event->attr.type;
> +
> + switch (event->attr.type) {
> + case PERF_TYPE_BREAKPOINT:
> + info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> + info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> + break;
> + default:
> + /* No additional info set. */

Should we prohibit using attr.sigtrap for !PERF_TYPE_BREAKPOINT if we
don't know what info to pass yet?

> + break;
> + }
> +
> force_sig_info(&info);
> }
>
> --
> 2.30.0.617.g56c4b15f3c-goog
>

2021-02-23 15:14:26

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] perf/core: Add breakpoint information to siginfo on SIGTRAP

On Tue, 23 Feb 2021 at 16:01, Dmitry Vyukov <[email protected]> wrote:
>
> On Tue, Feb 23, 2021 at 3:34 PM Marco Elver <[email protected]> wrote:
> >
> > Encode information from breakpoint attributes into siginfo_t, which
> > helps disambiguate which breakpoint fired.
> >
> > Note, providing the event fd may be unreliable, since the event may have
> > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > triggering and the signal being delivered to user space.
> >
> > Signed-off-by: Marco Elver <[email protected]>
> > ---
> > kernel/events/core.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > index 8718763045fd..d7908322d796 100644
> > --- a/kernel/events/core.c
> > +++ b/kernel/events/core.c
> > @@ -6296,6 +6296,17 @@ static void perf_sigtrap(struct perf_event *event)
> > info.si_signo = SIGTRAP;
> > info.si_code = TRAP_PERF;
> > info.si_errno = event->attr.type;
> > +
> > + switch (event->attr.type) {
> > + case PERF_TYPE_BREAKPOINT:
> > + info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > + info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> > + break;
> > + default:
> > + /* No additional info set. */
>
> Should we prohibit using attr.sigtrap for !PERF_TYPE_BREAKPOINT if we
> don't know what info to pass yet?

I don't think it's necessary. This way, by default we get support for
other perf events. If user space observes si_perf==0, then there's no
information available. That would require that any event type that
sets si_perf in future, must ensure that it sets si_perf!=0.

I can add a comment to document the requirement here (and user space
facing documentation should get a copy of how the info is encoded,
too).

Alternatively, we could set si_errno to 0 if no info is available, at
the cost of losing the type information for events not explicitly
listed here.

What do you prefer?

> > + break;
> > + }
> > +
> > force_sig_info(&info);
> > }
> >
> > --
> > 2.30.0.617.g56c4b15f3c-goog
> >

2021-02-23 15:18:53

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] perf/core: Add breakpoint information to siginfo on SIGTRAP

On Tue, Feb 23, 2021 at 4:10 PM 'Marco Elver' via kasan-dev
<[email protected]> wrote:
> > > Encode information from breakpoint attributes into siginfo_t, which
> > > helps disambiguate which breakpoint fired.
> > >
> > > Note, providing the event fd may be unreliable, since the event may have
> > > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > > triggering and the signal being delivered to user space.
> > >
> > > Signed-off-by: Marco Elver <[email protected]>
> > > ---
> > > kernel/events/core.c | 11 +++++++++++
> > > 1 file changed, 11 insertions(+)
> > >
> > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > index 8718763045fd..d7908322d796 100644
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -6296,6 +6296,17 @@ static void perf_sigtrap(struct perf_event *event)
> > > info.si_signo = SIGTRAP;
> > > info.si_code = TRAP_PERF;
> > > info.si_errno = event->attr.type;
> > > +
> > > + switch (event->attr.type) {
> > > + case PERF_TYPE_BREAKPOINT:
> > > + info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > > + info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> > > + break;
> > > + default:
> > > + /* No additional info set. */
> >
> > Should we prohibit using attr.sigtrap for !PERF_TYPE_BREAKPOINT if we
> > don't know what info to pass yet?
>
> I don't think it's necessary. This way, by default we get support for
> other perf events. If user space observes si_perf==0, then there's no
> information available. That would require that any event type that
> sets si_perf in future, must ensure that it sets si_perf!=0.
>
> I can add a comment to document the requirement here (and user space
> facing documentation should get a copy of how the info is encoded,
> too).
>
> Alternatively, we could set si_errno to 0 if no info is available, at
> the cost of losing the type information for events not explicitly
> listed here.
>
> What do you prefer?

Ah, I see.
Let's wait for the opinions of other people. There are a number of
options for how to approach this.

> > > + break;
> > > + }
> > > +
> > > force_sig_info(&info);
> > > }
> > >
> > > --
> > > 2.30.0.617.g56c4b15f3c-goog
> > >
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CANpmjNP1wQvG0SNPP2L9QO%3Dnatf0XU8HXj-r2_-U4QZxtr-dVA%40mail.gmail.com.

2021-02-23 17:19:19

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH RFC 4/4] perf/core: Add breakpoint information to siginfo on SIGTRAP

On Tue, 23 Feb 2021 at 16:16, Dmitry Vyukov <[email protected]> wrote:
>
> On Tue, Feb 23, 2021 at 4:10 PM 'Marco Elver' via kasan-dev
> <[email protected]> wrote:
> > > > Encode information from breakpoint attributes into siginfo_t, which
> > > > helps disambiguate which breakpoint fired.
> > > >
> > > > Note, providing the event fd may be unreliable, since the event may have
> > > > been modified (via PERF_EVENT_IOC_MODIFY_ATTRIBUTES) between the event
> > > > triggering and the signal being delivered to user space.
> > > >
> > > > Signed-off-by: Marco Elver <[email protected]>
> > > > ---
> > > > kernel/events/core.c | 11 +++++++++++
> > > > 1 file changed, 11 insertions(+)
> > > >
> > > > diff --git a/kernel/events/core.c b/kernel/events/core.c
> > > > index 8718763045fd..d7908322d796 100644
> > > > --- a/kernel/events/core.c
> > > > +++ b/kernel/events/core.c
> > > > @@ -6296,6 +6296,17 @@ static void perf_sigtrap(struct perf_event *event)
> > > > info.si_signo = SIGTRAP;
> > > > info.si_code = TRAP_PERF;
> > > > info.si_errno = event->attr.type;
> > > > +
> > > > + switch (event->attr.type) {
> > > > + case PERF_TYPE_BREAKPOINT:
> > > > + info.si_addr = (void *)(unsigned long)event->attr.bp_addr;
> > > > + info.si_perf = (event->attr.bp_len << 16) | (u64)event->attr.bp_type;
> > > > + break;
> > > > + default:
> > > > + /* No additional info set. */
> > >
> > > Should we prohibit using attr.sigtrap for !PERF_TYPE_BREAKPOINT if we
> > > don't know what info to pass yet?
> >
> > I don't think it's necessary. This way, by default we get support for
> > other perf events. If user space observes si_perf==0, then there's no
> > information available. That would require that any event type that
> > sets si_perf in future, must ensure that it sets si_perf!=0.
> >
> > I can add a comment to document the requirement here (and user space
> > facing documentation should get a copy of how the info is encoded,
> > too).
> >
> > Alternatively, we could set si_errno to 0 if no info is available, at
> > the cost of losing the type information for events not explicitly
> > listed here.

Note that PERF_TYPE_HARDWARE == 0, so setting si_errno to 0 does not
work. Which leaves us with:

1. Ensure si_perf==0 (or some other magic value) if no info is
available and !=0 otherwise.

2. Return error for events where we do not officially support
requesting sigtrap.

I'm currently leaning towards (1).

> > What do you prefer?
>
> Ah, I see.
> Let's wait for the opinions of other people. There are a number of
> options for how to approach this.

2021-02-23 18:04:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] signal: Introduce TRAP_PERF si_code and si_perf to siginfo

On Tue, Feb 23, 2021 at 3:52 PM Marco Elver <[email protected]> wrote:
> Introduces the TRAP_PERF si_code, and associated siginfo_t field
> si_perf. These will be used by the perf event subsystem to send signals
> (if requested) to the task where an event occurred.
>
> Signed-off-by: Marco Elver <[email protected]>

> arch/m68k/kernel/signal.c | 3 +++

Acked-by: Geert Uytterhoeven <[email protected]>

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2021-02-23 20:02:27

by Marco Elver

[permalink] [raw]
Subject: [PATCH RFC 1/4] perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children

As with other ioctls (such as PERF_EVENT_IOC_{ENABLE,DISABLE}), fix up
handling of PERF_EVENT_IOC_MODIFY_ATTRIBUTES to also apply to children.

Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
kernel/events/core.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 129dee540a8b..37a8297be164 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3179,16 +3179,36 @@ static int perf_event_modify_breakpoint(struct perf_event *bp,
static int perf_event_modify_attr(struct perf_event *event,
struct perf_event_attr *attr)
{
+ int (*func)(struct perf_event *, struct perf_event_attr *);
+ struct perf_event *child;
+ int err;
+
if (event->attr.type != attr->type)
return -EINVAL;

switch (event->attr.type) {
case PERF_TYPE_BREAKPOINT:
- return perf_event_modify_breakpoint(event, attr);
+ func = perf_event_modify_breakpoint;
+ break;
default:
/* Place holder for future additions. */
return -EOPNOTSUPP;
}
+
+ WARN_ON_ONCE(event->ctx->parent_ctx);
+
+ mutex_lock(&event->child_mutex);
+ err = func(event, attr);
+ if (err)
+ goto out;
+ list_for_each_entry(child, &event->child_list, child_list) {
+ err = func(child, attr);
+ if (err)
+ goto out;
+ }
+out:
+ mutex_unlock(&event->child_mutex);
+ return err;
}

static void ctx_sched_out(struct perf_event_context *ctx,
--
2.30.0.617.g56c4b15f3c-goog

2021-02-23 20:02:43

by Marco Elver

[permalink] [raw]
Subject: [PATCH RFC 3/4] perf/core: Add support for SIGTRAP on perf events

Adds bit perf_event_attr::sigtrap, which can be set to cause events to
send SIGTRAP (with si_code TRAP_PERF) to the task where the event
occurred. To distinguish perf events and allow user space to decode
si_perf (if set), the event type is set in si_errno.

The primary motivation is to support synchronous signals on perf events
in the task where an event (such as breakpoints) triggered.

Link: https://lore.kernel.org/lkml/[email protected]/
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Marco Elver <[email protected]>
---
include/uapi/linux/perf_event.h | 3 ++-
kernel/events/core.c | 21 +++++++++++++++++++++
2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index ad15e40d7f5d..b9cc6829a40c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -389,7 +389,8 @@ struct perf_event_attr {
cgroup : 1, /* include cgroup events */
text_poke : 1, /* include text poke events */
build_id : 1, /* use build id in mmap2 events */
- __reserved_1 : 29;
+ sigtrap : 1, /* send synchronous SIGTRAP on event */
+ __reserved_1 : 28;

union {
__u32 wakeup_events; /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 37a8297be164..8718763045fd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6288,6 +6288,17 @@ void perf_event_wakeup(struct perf_event *event)
}
}

+static void perf_sigtrap(struct perf_event *event)
+{
+ struct kernel_siginfo info;
+
+ clear_siginfo(&info);
+ info.si_signo = SIGTRAP;
+ info.si_code = TRAP_PERF;
+ info.si_errno = event->attr.type;
+ force_sig_info(&info);
+}
+
static void perf_pending_event_disable(struct perf_event *event)
{
int cpu = READ_ONCE(event->pending_disable);
@@ -6297,6 +6308,13 @@ static void perf_pending_event_disable(struct perf_event *event)

if (cpu == smp_processor_id()) {
WRITE_ONCE(event->pending_disable, -1);
+
+ if (event->attr.sigtrap) {
+ atomic_inc(&event->event_limit); /* rearm event */
+ perf_sigtrap(event);
+ return;
+ }
+
perf_event_disable_local(event);
return;
}
@@ -11325,6 +11343,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,

event->state = PERF_EVENT_STATE_INACTIVE;

+ if (event->attr.sigtrap)
+ atomic_set(&event->event_limit, 1);
+
if (task) {
event->attach_state = PERF_ATTACH_TASK;
/*
--
2.30.0.617.g56c4b15f3c-goog

2021-02-23 20:37:24

by Dmitry Vyukov

[permalink] [raw]
Subject: Re: [PATCH RFC 3/4] perf/core: Add support for SIGTRAP on perf events

On Tue, Feb 23, 2021 at 3:34 PM Marco Elver <[email protected]> wrote:
>
> Adds bit perf_event_attr::sigtrap, which can be set to cause events to
> send SIGTRAP (with si_code TRAP_PERF) to the task where the event
> occurred. To distinguish perf events and allow user space to decode
> si_perf (if set), the event type is set in si_errno.
>
> The primary motivation is to support synchronous signals on perf events
> in the task where an event (such as breakpoints) triggered.
>
> Link: https://lore.kernel.org/lkml/[email protected]/
> Suggested-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Marco Elver <[email protected]>
> ---
> include/uapi/linux/perf_event.h | 3 ++-
> kernel/events/core.c | 21 +++++++++++++++++++++
> 2 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index ad15e40d7f5d..b9cc6829a40c 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -389,7 +389,8 @@ struct perf_event_attr {
> cgroup : 1, /* include cgroup events */
> text_poke : 1, /* include text poke events */
> build_id : 1, /* use build id in mmap2 events */
> - __reserved_1 : 29;
> + sigtrap : 1, /* send synchronous SIGTRAP on event */
> + __reserved_1 : 28;
>
> union {
> __u32 wakeup_events; /* wakeup every n events */
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 37a8297be164..8718763045fd 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6288,6 +6288,17 @@ void perf_event_wakeup(struct perf_event *event)
> }
> }
>
> +static void perf_sigtrap(struct perf_event *event)
> +{
> + struct kernel_siginfo info;
> +
> + clear_siginfo(&info);
> + info.si_signo = SIGTRAP;
> + info.si_code = TRAP_PERF;
> + info.si_errno = event->attr.type;
> + force_sig_info(&info);
> +}
> +
> static void perf_pending_event_disable(struct perf_event *event)
> {
> int cpu = READ_ONCE(event->pending_disable);
> @@ -6297,6 +6308,13 @@ static void perf_pending_event_disable(struct perf_event *event)
>
> if (cpu == smp_processor_id()) {
> WRITE_ONCE(event->pending_disable, -1);
> +
> + if (event->attr.sigtrap) {
> + atomic_inc(&event->event_limit); /* rearm event */

We send the signal to the current task. Can this fire outside of the
current task context? E.g. in interrupt/softirq/etc? And then we will
send the signal to the current task. Watchpoint can be set to
userspace address and then something asynchronous (some IO completion)
that does not belong to this task access the userspace address (is
this possible?). But watchpoints can also be set to kernel addresses,
then another context can definitely access it.
(1) can this happen? maybe perf context is somehow disabled when !in_task()?
(2) if yes, what is the desired behavior?




> + perf_sigtrap(event);
> + return;
> + }
> +
> perf_event_disable_local(event);
> return;
> }
> @@ -11325,6 +11343,9 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>
> event->state = PERF_EVENT_STATE_INACTIVE;
>
> + if (event->attr.sigtrap)
> + atomic_set(&event->event_limit, 1);
> +
> if (task) {
> event->attach_state = PERF_ATTACH_TASK;
> /*
> --
> 2.30.0.617.g56c4b15f3c-goog
>

2021-02-23 21:04:48

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [PATCH RFC 2/4] signal: Introduce TRAP_PERF si_code and si_perf to siginfo

On Tue, Feb 23, 2021 at 7:01 PM Geert Uytterhoeven <[email protected]> wrote:
>
> On Tue, Feb 23, 2021 at 3:52 PM Marco Elver <[email protected]> wrote:
> > Introduces the TRAP_PERF si_code, and associated siginfo_t field
> > si_perf. These will be used by the perf event subsystem to send signals
> > (if requested) to the task where an event occurred.
> >
> > Signed-off-by: Marco Elver <[email protected]>
>
> > arch/m68k/kernel/signal.c | 3 +++
>
> Acked-by: Geert Uytterhoeven <[email protected]>
>

For asm-generic:

Acked-by: Arnd Bergmann <[email protected]>

2021-02-23 23:51:59

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH RFC 0/4] Add support for synchronous signals on perf events

On Tue, 23 Feb 2021 at 15:34, Marco Elver <[email protected]> wrote:
>
> The perf subsystem today unifies various tracing and monitoring
> features, from both software and hardware. One benefit of the perf
> subsystem is automatically inheriting events to child tasks, which
> enables process-wide events monitoring with low overheads. By default
> perf events are non-intrusive, not affecting behaviour of the tasks
> being monitored.
>
> For certain use-cases, however, it makes sense to leverage the
> generality of the perf events subsystem and optionally allow the tasks
> being monitored to receive signals on events they are interested in.
> This patch series adds the option to synchronously signal user space on
> events.
>
> The discussion at [1] led to the changes proposed in this series. The
> approach taken in patch 3/4 to use 'event_limit' to trigger the signal
> was kindly suggested by Peter Zijlstra in [2].
>
> [1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZD=-XQ@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> Motivation and example uses:
>
> 1. Our immediate motivation is low-overhead sampling-based race
> detection for user-space [3]. By using perf_event_open() at
> process initialization, we can create hardware
> breakpoint/watchpoint events that are propagated automatically
> to all threads in a process. As far as we are aware, today no
> existing kernel facility (such as ptrace) allows us to set up
> process-wide watchpoints with minimal overheads (that are
> comparable to mprotect() of whole pages).
>
> [3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
>
> 2. Other low-overhead error detectors that rely on detecting
> accesses to certain memory locations or code, process-wide and
> also only in a specific set of subtasks or threads.
>
> Other example use-cases we found potentially interesting:
>
> 3. Code hot patching without full stop-the-world. Specifically, by
> setting a code breakpoint to entry to the patched routine, then
> send signals to threads and check that they are not in the
> routine, but without stopping them further. If any of the
> threads will enter the routine, it will receive SIGTRAP and
> pause.
>
> 4. Safepoints without mprotect(). Some Java implementations use
> "load from a known memory location" as a safepoint. When threads
> need to be stopped, the page containing the location is
> mprotect()ed and threads get a signal. This can be replaced with
> a watchpoint, which does not require a whole page nor DTLB
> shootdowns.
>
> 5. Tracking data flow globally.
>
> 6. Threads receiving signals on performance events to
> throttle/unthrottle themselves.
>
>
> Marco Elver (4):
> perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
> signal: Introduce TRAP_PERF si_code and si_perf to siginfo
> perf/core: Add support for SIGTRAP on perf events
> perf/core: Add breakpoint information to siginfo on SIGTRAP

Note that we're currently pondering fork + exec, and suggestions would
be appreciated. We think we'll need some restrictions, like Peter
proposed here: here:
https://lore.kernel.org/lkml/YBvj6eJR%[email protected]/

We think what we want is to inherit the events to children only if
cloned with CLONE_SIGHAND. If there's space for a 'inherit_mask' in
perf_event_attr, that'd be most flexible, but perhaps we do not have
the space.

Thanks,
-- Marco

>
> arch/m68k/kernel/signal.c | 3 ++
> arch/x86/kernel/signal_compat.c | 5 ++-
> fs/signalfd.c | 4 +++
> include/linux/compat.h | 2 ++
> include/linux/signal.h | 1 +
> include/uapi/asm-generic/siginfo.h | 6 +++-
> include/uapi/linux/perf_event.h | 3 +-
> include/uapi/linux/signalfd.h | 4 ++-
> kernel/events/core.c | 54 +++++++++++++++++++++++++++++-
> kernel/signal.c | 11 ++++++
> 10 files changed, 88 insertions(+), 5 deletions(-)
>
> --
> 2.30.0.617.g56c4b15f3c-goog
>

2021-02-23 23:58:27

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH RFC 0/4] Add support for synchronous signals on perf events


> On Feb 23, 2021, at 6:34 AM, Marco Elver <[email protected]> wrote:
>
> The perf subsystem today unifies various tracing and monitoring
> features, from both software and hardware. One benefit of the perf
> subsystem is automatically inheriting events to child tasks, which
> enables process-wide events monitoring with low overheads. By default
> perf events are non-intrusive, not affecting behaviour of the tasks
> being monitored.
>
> For certain use-cases, however, it makes sense to leverage the
> generality of the perf events subsystem and optionally allow the tasks
> being monitored to receive signals on events they are interested in.
> This patch series adds the option to synchronously signal user space on
> events.

Unless I missed some machinations, which is entirely possible, you can’t call force_sig_info() from NMI context. Not only am I not convinced that the core signal code is NMI safe, but at least x86 can’t correctly deliver signals on NMI return. You probably need an IPI-to-self.

>
> The discussion at [1] led to the changes proposed in this series. The
> approach taken in patch 3/4 to use 'event_limit' to trigger the signal
> was kindly suggested by Peter Zijlstra in [2].
>
> [1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZD=-XQ@mail.gmail.com/
> [2] https://lore.kernel.org/lkml/[email protected]/
>
> Motivation and example uses:
>
> 1. Our immediate motivation is low-overhead sampling-based race
> detection for user-space [3]. By using perf_event_open() at
> process initialization, we can create hardware
> breakpoint/watchpoint events that are propagated automatically
> to all threads in a process. As far as we are aware, today no
> existing kernel facility (such as ptrace) allows us to set up
> process-wide watchpoints with minimal overheads (that are
> comparable to mprotect() of whole pages).

This would be doable much more simply with an API to set a breakpoint. All the machinery exists except the actual user API.

> [3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
>
> 2. Other low-overhead error detectors that rely on detecting
> accesses to certain memory locations or code, process-wide and
> also only in a specific set of subtasks or threads.
>
> Other example use-cases we found potentially interesting:
>
> 3. Code hot patching without full stop-the-world. Specifically, by
> setting a code breakpoint to entry to the patched routine, then
> send signals to threads and check that they are not in the
> routine, but without stopping them further. If any of the
> threads will enter the routine, it will receive SIGTRAP and
> pause.

Cute.

>
> 4. Safepoints without mprotect(). Some Java implementations use
> "load from a known memory location" as a safepoint. When threads
> need to be stopped, the page containing the location is
> mprotect()ed and threads get a signal. This can be replaced with
> a watchpoint, which does not require a whole page nor DTLB
> shootdowns.

I’m skeptical. Propagating a hardware breakpoint to all threads involves IPIs and horribly slow writes to DR1 (or 2, 3, or 4) and DR7. A TLB flush can be accelerated using paravirt or hypothetical future hardware. Or real live hardware on ARM64.

(The hypothetical future hardware is almost present on Zen 3. A bit of work is needed on the hardware end to make it useful.)

>
> 5. Tracking data flow globally.
>
> 6. Threads receiving signals on performance events to
> throttle/unthrottle themselves.
>
> Marco Elver (4):
> perf/core: Apply PERF_EVENT_IOC_MODIFY_ATTRIBUTES to children
> signal: Introduce TRAP_PERF si_code and si_perf to siginfo
> perf/core: Add support for SIGTRAP on perf events
> perf/core: Add breakpoint information to siginfo on SIGTRAP
>
> arch/m68k/kernel/signal.c | 3 ++
> arch/x86/kernel/signal_compat.c | 5 ++-
> fs/signalfd.c | 4 +++
> include/linux/compat.h | 2 ++
> include/linux/signal.h | 1 +
> include/uapi/asm-generic/siginfo.h | 6 +++-
> include/uapi/linux/perf_event.h | 3 +-
> include/uapi/linux/signalfd.h | 4 ++-
> kernel/events/core.c | 54 +++++++++++++++++++++++++++++-
> kernel/signal.c | 11 ++++++
> 10 files changed, 88 insertions(+), 5 deletions(-)
>
> --
> 2.30.0.617.g56c4b15f3c-goog
>

2021-02-24 00:34:53

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH RFC 0/4] Add support for synchronous signals on perf events

On Tue, 23 Feb 2021 at 21:27, Andy Lutomirski <[email protected]> wrote:
> > On Feb 23, 2021, at 6:34 AM, Marco Elver <[email protected]> wrote:
> >
> > The perf subsystem today unifies various tracing and monitoring
> > features, from both software and hardware. One benefit of the perf
> > subsystem is automatically inheriting events to child tasks, which
> > enables process-wide events monitoring with low overheads. By default
> > perf events are non-intrusive, not affecting behaviour of the tasks
> > being monitored.
> >
> > For certain use-cases, however, it makes sense to leverage the
> > generality of the perf events subsystem and optionally allow the tasks
> > being monitored to receive signals on events they are interested in.
> > This patch series adds the option to synchronously signal user space on
> > events.
>
> Unless I missed some machinations, which is entirely possible, you can’t call force_sig_info() from NMI context. Not only am I not convinced that the core signal code is NMI safe, but at least x86 can’t correctly deliver signals on NMI return. You probably need an IPI-to-self.

force_sig_info() is called from an irq_work only: perf_pending_event
-> perf_pending_event_disable -> perf_sigtrap -> force_sig_info. What
did I miss?

> > The discussion at [1] led to the changes proposed in this series. The
> > approach taken in patch 3/4 to use 'event_limit' to trigger the signal
> > was kindly suggested by Peter Zijlstra in [2].
> >
> > [1] https://lore.kernel.org/lkml/CACT4Y+YPrXGw+AtESxAgPyZ84TYkNZdP0xpocX2jwVAbZD=-XQ@mail.gmail.com/
> > [2] https://lore.kernel.org/lkml/[email protected]/
> >
> > Motivation and example uses:
> >
> > 1. Our immediate motivation is low-overhead sampling-based race
> > detection for user-space [3]. By using perf_event_open() at
> > process initialization, we can create hardware
> > breakpoint/watchpoint events that are propagated automatically
> > to all threads in a process. As far as we are aware, today no
> > existing kernel facility (such as ptrace) allows us to set up
> > process-wide watchpoints with minimal overheads (that are
> > comparable to mprotect() of whole pages).
>
> This would be doable much more simply with an API to set a breakpoint. All the machinery exists except the actual user API.

Isn't perf_event_open() that API?

A new user API implementation will either be a thin wrapper around
perf events or reinvent half of perf events to deal with managing
watchpoints across a set of tasks (process-wide or some subset).

It's not just breakpoints though.

> > [3] https://llvm.org/devmtg/2020-09/slides/Morehouse-GWP-Tsan.pdf
> >
> > 2. Other low-overhead error detectors that rely on detecting
> > accesses to certain memory locations or code, process-wide and
> > also only in a specific set of subtasks or threads.
> >
> > Other example use-cases we found potentially interesting:
> >
> > 3. Code hot patching without full stop-the-world. Specifically, by
> > setting a code breakpoint to entry to the patched routine, then
> > send signals to threads and check that they are not in the
> > routine, but without stopping them further. If any of the
> > threads will enter the routine, it will receive SIGTRAP and
> > pause.
>
> Cute.
>
> >
> > 4. Safepoints without mprotect(). Some Java implementations use
> > "load from a known memory location" as a safepoint. When threads
> > need to be stopped, the page containing the location is
> > mprotect()ed and threads get a signal. This can be replaced with
> > a watchpoint, which does not require a whole page nor DTLB
> > shootdowns.
>
> I’m skeptical. Propagating a hardware breakpoint to all threads involves IPIs and horribly slow writes to DR1 (or 2, 3, or 4) and DR7. A TLB flush can be accelerated using paravirt or hypothetical future hardware. Or real live hardware on ARM64.
>
> (The hypothetical future hardware is almost present on Zen 3. A bit of work is needed on the hardware end to make it useful.)

Fair enough. Although watchpoints can be much more fine-grained than
an mprotect() which then also has downsides (checking if the accessed
memory was actually the bytes we're interested in). Maybe we should
also ask CPU vendors to give us better watchpoints (perhaps start with
more of them, and easier to set in batch)? We still need a user space
API...

Thanks,
-- Marco



> >
> > 5. Tracking data flow globally.
> >
> > 6. Threads receiving signals on performance events to
> > throttle/unthrottle themselves.