2021-05-11 00:39:03

by Robert O'Callahan

[permalink] [raw]
Subject: Userspace notifications for observing userfaultfd faults

For rr (https://rr-project.org) to support recording and replaying
applications that use userfaultfd, we need to observe that a task we
are controlling has blocked on a userfault. Currently this is very
difficult to do, especially if a task blocks on a userfault on a page
where some other task has already triggered a userfault, so no new
userfaultfd event is generated. We also need to observe which page has
been faulted on so we can determine when the fault has been serviced
and the task is ready to run again.

I've tried to find workarounds with existing APIs and it doesn't seem
tractable. See https://github.com/rr-debugger/rr/issues/2852#issuecomment-837514946
for some thoughts about that.

It seems to me that a sufficient API for us would be a new software
perf event, e.g. PERF_COUNT_SW_USERFAULTS, with an associated
PERF_SAMPLE_ADDR that would give us the address of the page. Does that
sounds like a reasonable thing to add?

Robert O'Callahan


2021-05-11 18:14:17

by Axel Rasmussen

[permalink] [raw]
Subject: Re: Userspace notifications for observing userfaultfd faults

On Mon, May 10, 2021 at 5:38 PM Robert O'Callahan <[email protected]> wrote:
>
> For rr (https://rr-project.org) to support recording and replaying
> applications that use userfaultfd, we need to observe that a task we
> are controlling has blocked on a userfault. Currently this is very
> difficult to do, especially if a task blocks on a userfault on a page
> where some other task has already triggered a userfault, so no new
> userfaultfd event is generated. We also need to observe which page has
> been faulted on so we can determine when the fault has been serviced
> and the task is ready to run again.
>
> I've tried to find workarounds with existing APIs and it doesn't seem
> tractable. See https://github.com/rr-debugger/rr/issues/2852#issuecomment-837514946
> for some thoughts about that.
>
> It seems to me that a sufficient API for us would be a new software
> perf event, e.g. PERF_COUNT_SW_USERFAULTS, with an associated
> PERF_SAMPLE_ADDR that would give us the address of the page. Does that
> sounds like a reasonable thing to add?

Is some combination of bpf and kprobes a possible solution? There are
some seemingly relevant examples here:
https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md

I haven't tried it, but it seems like attaching to handle_userfault()
would give similar information to perf_count_sw_page_faults, but for
userfaults.

>
> Robert O'Callahan

2021-05-11 18:26:42

by Kyle Huey

[permalink] [raw]
Subject: Re: Userspace notifications for observing userfaultfd faults

On Tue, May 11, 2021 at 11:12 AM Axel Rasmussen
<[email protected]> wrote:
>
> On Mon, May 10, 2021 at 5:38 PM Robert O'Callahan <[email protected]> wrote:
> >
> > For rr (https://rr-project.org) to support recording and replaying
> > applications that use userfaultfd, we need to observe that a task we
> > are controlling has blocked on a userfault. Currently this is very
> > difficult to do, especially if a task blocks on a userfault on a page
> > where some other task has already triggered a userfault, so no new
> > userfaultfd event is generated. We also need to observe which page has
> > been faulted on so we can determine when the fault has been serviced
> > and the task is ready to run again.
> >
> > I've tried to find workarounds with existing APIs and it doesn't seem
> > tractable. See https://github.com/rr-debugger/rr/issues/2852#issuecomment-837514946
> > for some thoughts about that.
> >
> > It seems to me that a sufficient API for us would be a new software
> > perf event, e.g. PERF_COUNT_SW_USERFAULTS, with an associated
> > PERF_SAMPLE_ADDR that would give us the address of the page. Does that
> > sounds like a reasonable thing to add?
>
> Is some combination of bpf and kprobes a possible solution? There are
> some seemingly relevant examples here:
> https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md
>
> I haven't tried it, but it seems like attaching to handle_userfault()
> would give similar information to perf_count_sw_page_faults, but for
> userfaults.

My understanding is that using bpf/kprobes requires new permissions
that are both not currently required by rr and would not be required
by our proposed solution.

- Kyle

2021-05-11 22:17:13

by Robert O'Callahan

[permalink] [raw]
Subject: Re: Userspace notifications for observing userfaultfd faults

On Wed, May 12, 2021 at 6:12 AM Axel Rasmussen <[email protected]> wrote:
> Is some combination of bpf and kprobes a possible solution? There are
> some seemingly relevant examples here:
> https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md
>
> I haven't tried it, but it seems like attaching to handle_userfault()
> would give similar information to perf_count_sw_page_faults, but for
> userfaults.

That would probably work in some cases, but as Kyle said that requires
privileges and currently rr can run unprivileged (if you set
perf_event_paranoid to 1 or less) and usually does. Also, AFAIK,
kprobing handle_userfault would not be a stable ABI.

Rob

2021-05-11 22:26:09

by Axel Rasmussen

[permalink] [raw]
Subject: Re: Userspace notifications for observing userfaultfd faults

On Tue, May 11, 2021 at 3:15 PM Robert O'Callahan <[email protected]> wrote:
>
> On Wed, May 12, 2021 at 6:12 AM Axel Rasmussen <[email protected]> wrote:
> > Is some combination of bpf and kprobes a possible solution? There are
> > some seemingly relevant examples here:
> > https://github.com/iovisor/bpftrace/blob/master/docs/tutorial_one_liners.md
> >
> > I haven't tried it, but it seems like attaching to handle_userfault()
> > would give similar information to perf_count_sw_page_faults, but for
> > userfaults.
>
> That would probably work in some cases, but as Kyle said that requires
> privileges and currently rr can run unprivileged (if you set
> perf_event_paranoid to 1 or less) and usually does. Also, AFAIK,
> kprobing handle_userfault would not be a stable ABI.

True, it would not be a stable ABI. That could be solved by adding a
real tracepoint, instead of just relying on a kprobe on a particular
function. But, I don't think that solves the concern around
permissions.

I am no expert on perf_count_sw_page_faults and similar, so I'll leave
it up to others to give an opinion on extending that.

>
> Rob