LinuxLists.cc - [RFC] Userspace tracing memory mappings

2008-01-23 16:12:44

Subject: [RFC] Userspace tracing memory mappings

Hi,

Since memory management is not my speciality, I would like to know if
there are some implementation details I should be aware of for my
LTTng userspace tracing buffers. Here is what I want to do :

Upon a new alloc_tracebuf syscall :

- map the ZERO_PAGE in the current process. Reserve enough pages to hold
16 per cpu trace buffers at the same time. (supports up to 16 active
traces at the same time). Could be mapped write-only by the traced
process.
- Also reserve a few ZERO_PAGES for the buffer control
(current read/write offset...) : mapped RW by the process
- Also need some space for the kernel to export control information.
This could be pages mapped read-only by the process (seqlock,
tracing active....)
- When the process tries to write to these pages, allocate physical pages.
- The read-only (as seen by the process) pages should be allocated when
the kernel has its first trace active. Can be the ZERO_PAGE before
that.

When the process issues its first buffer switch (that's a second added
syscall) or exits before its first buffer switch, for every active trace
on the system, we create a debugfs file in the trace directory. A
userspace daemon gets inotified of the file creation and maps the
buffers specific to a single trace. (mmap on a file) The daemon already
uses ioctl on the file to get the buffer offset to read. This is the
"disk writer" daemon.

I don't think the kernel really has to map the buffers in its address
space. For kernel crash buffer extraction, I guess we can simply deal
with pages instead of virtual addresses. By doing so, we could extract
the userspace tracing buffers upon kernel crash.

We have to be aware that a new trace can be allocated/activated on the
system while the process is running. Therefore, the kernel and the
process would share a few pages (RW for the kernel, RO for the traced
process) where the trace control information would be held. I would
re-create the trace control information update mechanism I currently
have in LTTng for kernel-only tracing (I use RCU), but, since RCU is not
available in user-space, I would use a write seqlock in the kernel and a
read seqlock in userspace. These pages would therefore have to be mapped
at 3 different locations :

- Buffers
- traced process (write)
- disk writing daemon (read-only)
- Buffer control information (buffer read/write offsets)
- traced process (RW)
- kernel mapping (RW) (disk writing daemon issues an ioctl for offset
updates and hence doesn't need to map this information)
- Tracing control information
- kernel memory (RW)
- traced process (read-only)

So if we want the tightest control possible, we would have to create 3
different mappings, initially populated with the zero page, populated by
page faults, and shared between two locations each.

Comments/ideas/concerns are welcome.

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-01-23 19:38:45

by Dave Hansen

[permalink] [raw]

Subject: Re: [RFC] Userspace tracing memory mappings

On Wed, 2008-01-23 at 11:04 -0500, Mathieu Desnoyers wrote:
> Since memory management is not my speciality, I would like to know if
> there are some implementation details I should be aware of for my
> LTTng userspace tracing buffers. Here is what I want to do :

Can you start with a little background by telling us what a userspace
tracing buffer _is_? Maybe a few requirements about what you need it to
do and why, as well?

-- Dave

2008-01-23 19:11:29

by Frank Ch. Eigler

[permalink] [raw]

Subject: Re: [RFC] Userspace tracing memory mappings

Mathieu Desnoyers <[email protected]> writes:

> [...] Since memory management is not my speciality, I would like to
> know if there are some implementation details I should be aware of
> for my LTTng userspace tracing buffers. Here is what I want to do
> [...]

Would you mind offering some justification for requiring a kernel
extension for user-space tracing? What can the kernel enable in this
context that a user-space library (which you already assume will be
linked in) can't?

- FChE

2008-01-23 22:00:52

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: [RFC] Userspace tracing memory mappings

* Frank Ch. Eigler ([email protected]) wrote:
> Mathieu Desnoyers <[email protected]> writes:
>
> > [...] Since memory management is not my speciality, I would like to
> > know if there are some implementation details I should be aware of
> > for my LTTng userspace tracing buffers. Here is what I want to do
> > [...]
>
> Would you mind offering some justification for requiring a kernel
> extension for user-space tracing? What can the kernel enable in this
> context that a user-space library (which you already assume will be
> linked in) can't?
>
> - FChE

The kernel would provide :

- System-wide activation of markers located in userspace code
example use : libc, NPTL tracing.
- Ability to extract buffers of a crashed process
- Ability to extract userspace tracing buffers upon kernel crash
- Activation of userspace tracing at the same time as the kernel tracing
activation is done, without requiring messing up with signals.
- Potentially filtering on events coming from userspace, without messing
up with signals.

Another point is early boot tracing : tracing processes such as init
requires to use syscalls rather than relying on debugfs/dev/proc file
operations. And we can't dump the information to the disk yet, so we
cannot expect the process itself to deal with file opening or socket
opening that soon. Therefore, we have to divide tracing in two distinct
actions : writing to the buffers and dumping the buffers (to disk or
though the network).

Another reason why we don't want to do everything is a single library is
that it would account the disk write time to the traced process. If we
do this from the kernel, we can know how many time it took because we
trace it. Another, better yet, reason for this is that if we want to
extract the data to disk or through the network, and want to get the
last trace bits of a segfaulted process, we have to share the buffers
with another process somehow. However, creating one extra process per
traced process is kind of awkward.

So the code itself would be a library in userspace. However, it would
interact both with the kernel for trace activation and with a daemon to
extract the information to disk or to the network. I start to think that
a userspace library would be sufficient for the userspace part of this
design (no need to modify vDSO).

And system V shared memory has a limit on the number of such memory
mapping one can have in the system that is way too low.

Does it explain the purpose of the kernel interaction better ?

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2008-01-23 22:06:40

by Mathieu Desnoyers

[permalink] [raw]

Subject: Re: [RFC] Userspace tracing memory mappings

* Dave Hansen ([email protected]) wrote:
> On Wed, 2008-01-23 at 11:04 -0500, Mathieu Desnoyers wrote:
> > Since memory management is not my speciality, I would like to know if
> > there are some implementation details I should be aware of for my
> > LTTng userspace tracing buffers. Here is what I want to do :
>
> Can you start with a little background by telling us what a userspace
> tracing buffer _is_? Maybe a few requirements about what you need it to
> do and why, as well?
>
> -- Dave
>

Sure,

Userspace tracing is :

- A userspace process wants to record information to a circular ring
buffer. This information has a timestamp. It should disrupt the
timings minimally. The timestamps must be synchronized with the
timestamps given to the kernel trace events so we can analyze all the
information together.

- When one subbuffer of the ring buffer is filled, the information is
ready to be read by a "trace dumping" process and sent to disk or to
the network. At this point, the traced process raises a flag that will
be checked periodically by the OS to wake up the disk/network dumper
daemon. (for future reference, I use the term "buffer writer" when I
talk about the traced process and the term "buffer reader" when
talking about the disk/network dumper daemon).

There is more information in the email I sent to Frank Eigler. Please
feel free to ask for more if I am not clear about specific points.

A lot of the background information is already explained in the kernel
tracing paper I presented at OLS2006, it might be a good start :

http://ltt.polymtl.ca/papers/desnoyers-ols2006.pdf

Another requirement I am trying to meet is protection of tracing buffers
against corruption coming from other userspace process. K42 implemented
their tracing buffers shared system-wide : with the OS too. The
processes have full access to the kernel buffers and can therefore
corrupt the whole system's trace. This is something I would like not to
allow.

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68