This quick tracepipe patch is an attempt to make it trivial for a
developer to generate debugging traces in the kernel and get them to
mass storage without having the tracing enormously skew behaviour. It
is built off of Greg's debugfs and Linus' advocacy of efficient buffer
transit with pipes and pipe_buffers.
A kernel subsystem registers a file in debugfs and gets a struct cookie
in return. The file that is created uses fs/pipe's file_operations but
wraps its own open and release tracking around it.
While it's running the kernel subsystem can send binary blobs, less than
the length of a page, down this channel. The blobs are copied into
per-cpu lists of pages. Cutesy little headers with get_cycles() and the
cpu id are prepended to each blob. The traces are only recorded if user
space has open references to the file.
As the pages fill they're kicked off to a work_struct worker who puts
them in the bufs[] array in the debugfs pipe file. Userspace can then
do whatever it wants with the data via the pipe. One can imagine it
wanting to splice() these pages to disk in huge batches, or perhaps some
zero-copy network card, etc. I've only tested this so far as verifying
that 'cat' is able to push data into a regular file.
I didn't aim for for optimal behaviour before sending this patch out.
I'm looking for comments. There's lots of room for improvement,
particularly in reducing synchronization between cpus (not as it fills
each dinky page) and in more flexible buffering semantics. As written
there is no support for cyclic lists of pages rather than streaming nor
are a huge amount of pending pages allowed. debugfs masks the mode of
the file so it appears to be a regular file, but that'd be trivial to
fix if this is methodology is useful.
Thoughts? I, for one, am tired of writing throw-away per-cpu tracing
patches ;)
Zach Brown wrote:
> Thoughts? I, for one, am tired of writing throw-away per-cpu tracing
> patches ;)
Have you taken a look at relayfs and ltt?
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546
Karim Yaghmour wrote:
> Zach Brown wrote:
>
>>Thoughts? I, for one, am tired of writing throw-away per-cpu tracing
>>patches ;)
>
> Have you taken a look at relayfs and ltt?
Only briefly. They've always seemed more involved than the sort of
thing I was after. I'll try and sit down and investigate in more detail.
Zach Brown wrote:
> Only briefly. They've always seemed more involved than the sort of
> thing I was after. I'll try and sit down and investigate in more detail.
There's definitely an opportunity for interfacing here. If nothing else,
this clearly shows the interest for the kind of things both relayfs and
ltt attempt to achieve.
So here are a few comments regading the implementation and how this
relates to the stuff I'm working on.
> While it's running the kernel subsystem can send binary blobs, less than
> the length of a page, down this channel. The blobs are copied into
> per-cpu lists of pages. Cutesy little headers with get_cycles() and the
> cpu id are prepended to each blob. The traces are only recorded if user
> space has open references to the file.
In the case of LTT, we just open one relay channel per cpu. This avoids
having to write the CPUID to the trace, that's 2 bytes less per event,
and also avoids any need for synchronization.
As for get_cycles(), some architectures don't have anything useful to
give. Here's for ARM (include/asm-arm/timex.h):
static inline cycles_t get_cycles (void)
{
return 0;
}
In the case of LTT, we just use the, albeit expensive, do_gettimeofday
when hardware counters aren't there (currently all non-x86 tracing does
this, but this should be fixed.) Also, in the case of the x86 at least,
we just write the lower 32-bits of the TSC, so that's 4 bytes less per
event. Instead, we use the buffer_start and buffer_end callbacks provided
by relayfs to write a header and footer containing full do_gettimeofday
value and TSC value.
> As the pages fill they're kicked off to a work_struct worker who puts
> them in the bufs[] array in the debugfs pipe file. Userspace can then
> do whatever it wants with the data via the pipe. One can imagine it
> wanting to splice() these pages to disk in huge batches, or perhaps some
> zero-copy network card, etc. I've only tested this so far as verifying
> that 'cat' is able to push data into a regular file.
It seems to me that while this is a nice use of pipes, it isn't as fast
as ram-locked pages. Basically relayfs does the bttv driver magic (or
what used to be done in there, I haven't checked what they do lately.)
Basically, we allocate pages, lock them into ram and remap them for use
as a single memory area. No caching necessary. It goes from the buffer
to whatever media you want (disk, network, etc.) IOW, user-space does
a open(), mmap(), write(). Also, the channels exist whether user-space
has done an open or not. That's good for flight-recording.
Looking at the code:
- tracepipe_event() does a get_cpu()/put_cpu() for protecting the
writing to the buffer. What about tracing within an interrupt?
local_irq_save()?
- I hadn't thought of doing something like this to write the header:
+ hdr = tcpu->next_region;
+ hdr->cycles = get_cycles();
+ hdr->cpu = cpu;
I will replace some of the memcpy() code in LTT with something like this.
- From what I assume is a "whishlist":
+ * - actually communicate missed to userspace
Already done in LTT.
+ * - how to specify wrapping or dropping
relayfs provides RELAY_MODE_CONTINUOUS and RELAY_MODE_NO_OVERWRITE.
+ * - non-temporal stores into bufs
The latest relayfs code doesn't care about timestamps. It's its
clients job to do that (ex. ltt).
+ * - let caller reserve space and get a pointer into buf
This is the relevant relayfs function:
char* relay_reserve(struct rchan *rchan, u32 len, int *err, int *interrupting)
Karim
--
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [email protected] || 1-866-677-4546