2002-09-21 23:07:10

by Karim Yaghmour

[permalink] [raw]
Subject: [PATCH] LTT for 2.5.37 2/9: Trace driver


This is the LTT trace driver itself. This is the only place where the
configuration option of the lockless scheme actually has any effect.
Otherwise, there would to much of #ifdef's mess around.

Here are the file modifications:
drivers/Makefile | 1
drivers/trace/Config.help | 48
drivers/trace/Config.in | 8
drivers/trace/Makefile | 17
drivers/trace/tracer.c | 2398 ++++++++++++++++++++++++++++++++++++++++++++++
drivers/trace/tracer.h | 233 ++++
6 files changed, 2705 insertions

diff -urpN linux-2.5.37/drivers/Makefile linux-2.5.37-ltt/drivers/Makefile
--- linux-2.5.37/drivers/Makefile Fri Sep 20 11:20:32 2002
+++ linux-2.5.37-ltt/drivers/Makefile Fri Sep 20 12:43:27 2002
@@ -41,5 +41,6 @@ obj-$(CONFIG_MD) += md/
obj-$(CONFIG_BLUEZ) += bluetooth/
obj-$(CONFIG_HOTPLUG_PCI) += hotplug/
obj-$(CONFIG_ISDN_BOOL) += isdn/
+obj-$(CONFIG_TRACE) += trace/

include $(TOPDIR)/Rules.make
diff -urpN linux-2.5.37/drivers/trace/Config.help linux-2.5.37-ltt/drivers/trace/Config.help
--- linux-2.5.37/drivers/trace/Config.help Wed Dec 31 19:00:00 1969
+++ linux-2.5.37-ltt/drivers/trace/Config.help Sat Sep 21 18:02:51 2002
@@ -0,0 +1,48 @@
+Kernel events tracing support
+CONFIG_TRACE
+ It is possible for the kernel to log important events to a tracing
+ driver. Doing so, enables the use of the generated traces in order
+ to reconstruct the dynamic behavior of the kernel, and hence the
+ whole system.
+
+ The tracing process contains 4 parts :
+ 1) The logging of events by key parts of the kernel.
+ 2) The trace driver that keeps the events in a data buffer.
+ 3) A trace daemon that opens the trace driver and is notified
+ every time there is a certain quantity of data to read
+ from the trace driver (using SIG_IO).
+ 4) A trace event data decoder that reads the accumulated data
+ and formats it in a human-readable format.
+
+ If you say Y or M here, the first part of the tracing process will
+ always take place. That is, critical parts of the kernel will call
+ upon the kernel tracing function. The data generated doesn't go
+ any further until a trace driver registers himself as such with the
+ kernel. Therefore, if you answer Y, then the driver will be part of
+ the kernel and the events will always proceed onto the driver and
+ if you say M, then the events will only proceed onto the driver when
+ it's module is loaded. Note that event's aren't logged in the driver
+ until the profiling daemon opens the device, configures it and
+ issues the "start" command through ioctl().
+
+ The impact of a fully functionnal system (kernel event logging +
+ driver event copying + active trace daemon) is of 2.5% for core events.
+ This means that for a task that took 100 seconds on a normal system, it
+ will take 102.5 seconds on a traced system. This is very low compared
+ to other profiling or tracing methods.
+
+ For more information on kernel tracing, the trace daemon or the event
+ decoder, please check the following address :
+ http://www.opersys.com/LTT
+
+CONFIG_LOCKLESS_TRACE
+ There are normally two tracing schemes available and selectable at
+ run-time via the trace daemon - locking and lockless. In some cases
+ e.g. embedded real-time systems, it may be desirable to exclude the
+ lockless code from the driver in the interest of making it smaller.
+ Even in such a case, the advantages provided by the lockless code
+ outweigh the slight increase in size (about 4KB). Unless you're
+ really out of space, keep this to Y. Setting this to N is probably
+ a sign that you probably have size problems elsewhere ...
+
+ If unsure, say Y.
diff -urpN linux-2.5.37/drivers/trace/Config.in linux-2.5.37-ltt/drivers/trace/Config.in
--- linux-2.5.37/drivers/trace/Config.in Wed Dec 31 19:00:00 1969
+++ linux-2.5.37-ltt/drivers/trace/Config.in Sat Sep 21 17:56:34 2002
@@ -0,0 +1,8 @@
+mainmenu_option next_comment
+comment 'Kernel tracing'
+tristate 'Kernel events tracing support' CONFIG_TRACE
+if [ "$CONFIG_TRACE" != "n" ]; then
+ dep_mbool ' Lock-free tracing support' CONFIG_LOCKLESS_TRACE $CONFIG_TRACE
+fi
+
+endmenu
diff -urpN linux-2.5.37/drivers/trace/Makefile linux-2.5.37-ltt/drivers/trace/Makefile
--- linux-2.5.37/drivers/trace/Makefile Wed Dec 31 19:00:00 1969
+++ linux-2.5.37-ltt/drivers/trace/Makefile Fri Sep 20 12:43:27 2002
@@ -0,0 +1,17 @@
+#
+# Makefile for the kernel tracing drivers.
+#
+# Note! Dependencies are done automagically by 'make dep', which also
+# removes any old dependencies. DON'T put your own dependencies here
+# unless it's something special (ie not a .c file).
+#
+# Note 2! The CFLAGS definitions are now inherited from the
+# parent makes..
+#
+
+O_TARGET := built-in.o
+
+# Is it loaded as a module or as part of the kernel
+obj-$(CONFIG_TRACE) = tracer.o
+
+include $(TOPDIR)/Rules.make
diff -urpN linux-2.5.37/drivers/trace/tracer.c linux-2.5.37-ltt/drivers/trace/tracer.c
--- linux-2.5.37/drivers/trace/tracer.c Wed Dec 31 19:00:00 1969
+++ linux-2.5.37-ltt/drivers/trace/tracer.c Sat Sep 21 17:56:34 2002
@@ -0,0 +1,2398 @@
+/*
+ * linux/drivers/trace/tracer.c
+ *
+ * (C) Copyright, 1999, 2000, 2001, 2002 - Karim Yaghmour ([email protected])
+ *
+ * Contains the code for the kernel tracing driver (tracer for short).
+ *
+ * Author:
+ * Karim Yaghmour (karim[email protected])
+ *
+ * Changelog:
+ * 16/02/02, Added Tom Zanussi's implementation of K42's lockless logging.
+ * K42 tracing guru Robert Wisniewski participated in the
+ * discussions surrounding this implementation. A big thanks to
+ * the IBM folks.
+ * 03/12/01, Added user event support.
+ * 05/01/01, Modified PPC bit manipulation functions for x86 compatibility.
+ * ([email protected])
+ * 15/11/00, Finally fixed memory allocation and remapping method. Now using
+ * BTTV-driver-inspired code.
+ * 13/03/00, Modified tracer so that the daemon mmaps the tracer's buffers
+ * in it's address space rather than use "read".
+ * 26/01/00, Added support for standardized buffer sizes and extensibility
+ * of events.
+ * 01/10/99, Modified tracer in order to used double-buffering.
+ * 28/09/99, Adding tracer configuration support.
+ * 09/09/99, Chaging the format of an event record in order to reduce the
+ * size of the traces.
+ * 04/03/99, Initial typing.
+ *
+ * Note:
+ * The sizes of the variables used to store the details of an event are
+ * planned for a system who gets at least one clock tick every 10
+ * milli-seconds. There has to be at least one event every 2^32-1
+ * microseconds, otherwise the size of the variable holding the time doesn't
+ * work anymore.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/time.h>
+#include <linux/trace.h>
+#include <linux/wrapper.h>
+#include <linux/vmalloc.h>
+#include <linux/mm.h>
+
+#include <asm/io.h>
+#include <asm/current.h>
+#include <asm/uaccess.h>
+#include <asm/bitops.h>
+#include <asm/pgtable.h>
+#include <asm/trace.h>
+
+#include "tracer.h"
+
+/* Module information */
+MODULE_AUTHOR("Karim Yaghmour ([email protected])");
+MODULE_DESCRIPTION("Linux Trace Toolkit (LTT) kernel tracing driver");
+MODULE_LICENSE("GPL");
+
+/* Driver */
+static int sMajorNumber; /* Major number of the tracer */
+static int sOpenCount; /* Number of times device is open */
+/* Locking */
+static int sTracLock; /* Tracer lock used to lock primary buffer */
+static spinlock_t sSpinLock; /* Spinlock in order to lock kernel */
+/* Daemon */
+static int sSignalSent; /* A signal has been sent to the daemon */
+static struct task_struct* sDaemonTaskStruct; /* Task structure of the tracer daemon */
+/* Tracer configuration */
+static int sTracerStarted; /* Is the tracer started */
+static trace_event_mask sTracedEvents; /* Bit-field of events being traced */
+static trace_event_mask sLogEventDetailsMask; /* Log the details of the events mask */
+static int sLogCPUID; /* Log the CPUID associated with each event */
+static int sUseSyscallEIPBounds; /* Use adress bounds to fetch the EIP where call is made */
+static int sLowerEIPBoundSet; /* The lower bound EIP has been set */
+static int sUpperEIPBoundSet; /* The upper bound EIP has been set */
+static void* sLowerEIPBound; /* The lower bound EIP */
+static void* sUpperEIPBound; /* The upper bound EIP */
+static int sTracingPID; /* Tracing only the events for one pid */
+static int sTracingPGRP; /* Tracing only the events for one process group */
+static int sTracingGID; /* Tracing only the events for one gid */
+static int sTracingUID; /* Tracing only the events for one uid */
+static pid_t sTracedPID; /* PID being traced */
+static pid_t sTracedPGRP; /* Process group being traced */
+static gid_t sTracedGID; /* GID being traced */
+static uid_t sTracedUID; /* UID being traced */
+static int sSyscallEIPDepthSet; /* The call depth at which to fetch EIP has been set */
+static int sSyscallEIPDepth; /* The call depth at which to fetch the EIP */
+/* Event data buffers */
+static int sBufReadComplete; /* Number of buffers completely filled */
+static int sSizeReadIncomplete; /* Quantity of data read from incomplete buffers */
+static int sEventsLost; /* Number of events lost because of lack of buffer space */
+static u32 sBufSize; /* Buffer sizes */
+static u32 sAllocSize; /* Size of buffers allocated */
+static u32 sBufferID; /* Unique buffer ID */
+static char* sTracBuf = NULL; /* Trace buffer */
+static char* sWritBuf = NULL; /* Buffer used for writting */
+static char* sReadBuf = NULL; /* Buffer used for reading */
+static char* sWritBufEnd; /* End of write buffer */
+static char* sReadBufEnd; /* End of read buffer */
+static char* sWritPos; /* Current position for writting */
+static char* sReadLimit; /* Limit at which read should stop */
+static char* sWritLimit; /* Limit at which write should stop */
+static int sUseLocking; /* Holds command from daemon */
+static u32 sBufnoBits; /* Holds command from daemon */
+static u32 sBufOffsetBits; /* Holds command from daemon */
+static int sBuffersFull; /* All-buffers-full boolean */
+
+/* Time */
+static struct timeval sBufferStartTime; /* The time at which the buffer was started */
+
+/* Large data components allocated at load time */
+static char *sUserEventData = NULL; /* The data associated with a user event */
+
+/* The global per-buffer control data structure, shared between the tracing
+ driver and the trace daemon via ioctl. */
+static struct buffer_control sBufferControl;
+
+/* The size of the structures used to describe the events */
+static int sEventStructSize[TRACE_EV_MAX + 1] =
+{
+ sizeof(trace_start) /* TRACE_START */ ,
+ sizeof(trace_syscall_entry) /* TRACE_SYSCALL_ENTRY */ ,
+ 0 /* TRACE_SYSCALL_EXIT */ ,
+ sizeof(trace_trap_entry) /* TRACE_TRAP_ENTRY */ ,
+ 0 /* TRACE_TRAP_EXIT */ ,
+ sizeof(trace_irq_entry) /* TRACE_IRQ_ENTRY */ ,
+ 0 /* TRACE_IRQ_EXIT */ ,
+ sizeof(trace_schedchange) /* TRACE_SCHEDCHANGE */ ,
+ 0 /* TRACE_KERNEL_TIMER */ ,
+ sizeof(trace_soft_irq) /* TRACE_SOFT_IRQ */ ,
+ sizeof(trace_process) /* TRACE_PROCESS */ ,
+ sizeof(trace_file_system) /* TRACE_FILE_SYSTEM */ ,
+ sizeof(trace_timer) /* TRACE_TIMER */ ,
+ sizeof(trace_memory) /* TRACE_MEMORY */ ,
+ sizeof(trace_socket) /* TRACE_SOCKET */ ,
+ sizeof(trace_ipc) /* TRACE_IPC */ ,
+ sizeof(trace_network) /* TRACE_NETWORK */ ,
+ sizeof(trace_buffer_start) /* TRACE_BUFFER_START */ ,
+ 0 /* TRACE_BUFFER_END */ ,
+ sizeof(trace_new_event) /* TRACE_NEW_EVENT */ ,
+ sizeof(trace_custom) /* TRACE_CUSTOM */ ,
+ sizeof(trace_change_mask) /* TRACE_CHANGE_MASK */
+};
+
+/* The file operations available for the tracer */
+static struct file_operations sTracerFileOps =
+{
+ owner: THIS_MODULE,
+ ioctl: tracer_ioctl,
+ mmap: tracer_mmap,
+ open: tracer_open,
+ release: tracer_release,
+ fsync: tracer_fsync,
+};
+
+#if CONFIG_LOCKLESS_TRACE
+static u32 sLastEventIndex; /* For full-buffers state */
+static struct timeval sLastEventTimeStamp; /* For full-buffers state */
+/* Space reserved for TRACE_EV_BUFFER_START */
+static u32 sStartReserve = TRACER_FIRST_EVENT_SIZE;
+
+/* Space reserved for TRACE_EV_BUFFER_END event + sizeof lost word, which
+ though the sizeof lost word isn't necessarily contiguous with rest of
+ event (it's always at the end of the buffer) is included here for code
+ clarity. */
+static u32 sEndReserve = TRACER_LAST_EVENT_SIZE;
+#endif /* CONFIG_LOCKLESS_TRACE */
+
+/* This inspired by rtai/shmem */
+#define FIX_SIZE(x) (((x) - 1) & PAGE_MASK) + PAGE_SIZE
+
+/* \begin{Code inspired from BTTV driver} */
+
+/* Here we want the physical address of the memory.
+ * This is used when initializing the contents of the
+ * area and marking the pages as reserved.
+ */
+static inline unsigned long kvirt_to_pa(unsigned long adr)
+{
+ unsigned long kva, ret;
+
+ kva = (unsigned long) page_address(vmalloc_to_page((void *) adr));
+ kva |= adr & (PAGE_SIZE - 1); /* restore the offset */
+ ret = __pa(kva);
+ return ret;
+}
+
+static void *rvmalloc(unsigned long size)
+{
+ void *mem;
+ unsigned long adr;
+
+ mem = vmalloc_32(size);
+ if (!mem)
+ return NULL;
+
+ memset(mem, 0, size); /* Clear the ram out, no junk to the user */
+ adr = (unsigned long) mem;
+ while (size > 0) {
+ mem_map_reserve(vmalloc_to_page((void *) adr));
+ adr += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ }
+
+ return mem;
+}
+
+static void rvfree(void *mem, unsigned long size)
+{
+ unsigned long adr;
+
+ if (!mem)
+ return;
+
+ adr = (unsigned long) mem;
+ while ((long) size > 0) {
+ mem_map_unreserve(vmalloc_to_page((void *) adr));
+ adr += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ }
+ vfree(mem);
+}
+
+static int tracer_mmap_region(struct vm_area_struct *vma,
+ const char *adr,
+ const char *start_pos,
+ unsigned long size)
+{
+ unsigned long start = (unsigned long) adr;
+ unsigned long page, pos;
+
+ pos = (unsigned long) start_pos;
+ while (size > 0) {
+ page = kvirt_to_pa(pos);
+ if (remap_page_range(vma, start, page, PAGE_SIZE, PAGE_SHARED))
+ return -EAGAIN;
+ start += PAGE_SIZE;
+ pos += PAGE_SIZE;
+ size -= PAGE_SIZE;
+ }
+ return 0;
+}
+/* \end{Code inspired from BTTV driver} */
+
+/**
+ * tracer_write_to_buffer: - Write data to destination buffer
+ *
+ * Writes data to the destination buffer and updates the begining the
+ * buffer write position.
+ */
+#define tracer_write_to_buffer(DEST, SRC, SIZE) \
+do\
+{\
+ memcpy(DEST, SRC, SIZE);\
+ DEST += SIZE;\
+} while(0);
+
+#if CONFIG_LOCKLESS_TRACE
+/*** Lockless scheme functions ***/
+
+/**
+ * init_buffer_control: - Init buffer control struct for new tracing run.
+ * @pmBC: buffer control struct to be initialized
+ * @pmUseLockless: which tracing scheme to use, TRUE for lockless
+ * @pmBufnoBits: number of bits in index word to use for buffer number
+ * @pmOffsetBits: number of bits in index word to use for buffer offset
+ *
+ * Sanity of param values should be checked by caller. i.e. bufno_bits and
+ * offset_bits must reflect sane buffer sizes/numbers.
+ */
+static void init_buffer_control(struct buffer_control * pmBC,
+ int pmUseLockless,
+ u8 pmBufnoBits,
+ u8 pmOffsetBits)
+{
+ unsigned i;
+
+ if((pmBC->using_lockless = pmUseLockless) == TRUE) {
+ pmBC->index = sStartReserve;
+ pmBC->bufno_bits = pmBufnoBits;
+ pmBC->n_buffers = TRACE_MAX_BUFFER_NUMBER(pmBufnoBits);
+ pmBC->offset_bits = pmOffsetBits;
+ pmBC->offset_mask = TRACE_BUFFER_OFFSET_MASK(pmOffsetBits);
+ pmBC->index_mask = (1UL << (pmBufnoBits + pmOffsetBits)) - 1;
+
+ pmBC->buffers_produced = pmBC->buffers_consumed = 0;
+
+ /* When a new buffer is switched to, TRACE_BUFFER_SIZE is
+ subtracted from its fill_count in order to initialize it
+ to the empty state. The reason it's done this way is
+ because an intervening event may have already been written
+ to the buffer while we were in the process of switching and
+ thus blindly initializing to 0 would erase that event.
+ The first buffer is initialized to 0 and the others are
+ initialized to TRACE_BUFFER_SIZE because the very first
+ buffer we ever see won't be initialized in that way by
+ the switching code and since there's never been an event,
+ we know it should be 0 and that it must be explicitly
+ initialized that way before logging begins. sStartReserve
+ is is factored into the end-of-buffer processing, so isn't
+ added to the fill counts here, except for the first. */
+ atomic_set(&pmBC->fill_count[0], (int)sStartReserve);
+ for(i = 1; i < TRACER_MAX_BUFFERS; i++)
+ atomic_set(&pmBC->fill_count[i], (int)TRACE_BUFFER_SIZE(pmOffsetBits));
+
+ /* All buffers are empty at this point */
+ sBuffersFull = FALSE;
+ }
+}
+
+/* These inline atomic functions wrap the linux versions in order to
+ implement the interface we want as well as to ensure memory barriers. */
+
+/**
+ * compare_and_store_volatile: - Self-explicit
+ * @ptr: ptr to the word that will receive the new value
+ * @oval: the value we think is currently in *ptr
+ * @nval: the value *ptr will get if we were right
+ *
+ * If *ptr is still what we think it is, atomically assign nval to it and
+ * return a boolean indicating TRUE if the new value was stored, FALSE
+ * otherwise.
+ *
+ * Pseudocode for this operation:
+ *
+ * if(*ptr == oval) {
+ * *ptr = nval;
+ * return TRUE;
+ * } else {
+ * return FALSE;
+ * }
+ */
+inline int compare_and_store_volatile(volatile u32 *ptr,
+ u32 oval,
+ u32 nval)
+{
+ u32 prev;
+
+ barrier();
+ prev = cmpxchg(ptr, oval, nval);
+ barrier();
+
+ return (prev == oval);
+}
+
+/**
+ * atomic_set_volatile: - Atomically set the value in ptr to nval.
+ * @ptr: ptr to the word that will receive the new value
+ * @nval: the new value
+ *
+ * Uses memory barriers to set *ptr to nval.
+ */
+inline void atomic_set_volatile(atomic_t *ptr,
+ u32 nval)
+{
+ barrier();
+ atomic_set(ptr, (int)nval);
+ barrier();
+}
+
+/**
+ * atomic_add_volatile: - Atomically add val to the value at ptr.
+ * @ptr: ptr to the word that will receive the addition
+ * @val: the value to add to *ptr
+ *
+ * Uses memory barriers to add val to *ptr.
+ */
+inline void atomic_add_volatile(atomic_t *ptr, u32 val)
+{
+ barrier();
+ atomic_add((int)val, ptr);
+ barrier();
+}
+
+/**
+ * atomic_sub_volatile: - Atomically substract val from the value at ptr.
+ * @ptr: ptr to the word that will receive the subtraction
+ * @val: the value to subtract from *ptr
+ *
+ * Uses memory barriers to substract val from *ptr.
+ */
+inline void atomic_sub_volatile(atomic_t *ptr, s32 val)
+{
+ barrier();
+ atomic_sub((int)val, ptr);
+ barrier();
+}
+
+/**
+ * trace_commit: - Atomically commit a reserved slot in the buffer.
+ * @index: index into the trace buffer
+ * @len: the value to add to fill_count of the buffer contained in index
+ *
+ * Atomically add len to the fill_count of the buffer specified by the
+ * buffer number contained in index.
+ */
+static inline void trace_commit(u32 index, u32 len)
+{
+ u32 bufno = TRACE_BUFFER_NUMBER_GET(index, sBufferControl.offset_bits);
+ atomic_add_volatile(&sBufferControl.fill_count[bufno], len);
+}
+
+/**
+ * write_start_buffer_event: - Write start-buffer event to buffer start.
+ * @pmIndex: index into the trace buffer
+ * @pmTime: the time of the start-buffer event
+ *
+ * Writes start-buffer event at the start of the buffer specified by the
+ * buffer number contained in pmIndex.
+ */
+static inline void write_start_buffer_event(u32 pmIndex, struct timeval pmTime)
+{
+ trace_buffer_start lStartBufferEvent; /* Start of new buffer event */
+ u8 lEventID; /* Event ID of last event */
+ uint16_t lDataSize; /* Size of tracing data */
+ trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */
+ char* lWritPos; /* Current position for writing */
+
+ /* Clear the offset bits of index to get the beginning of buffer */
+ lWritPos = sTracBuf + TRACE_BUFFER_OFFSET_CLEAR(pmIndex,
+ sBufferControl.offset_mask);
+
+ /* Increment buffer ID */
+ sBufferID++;
+
+ /* Write the start of buffer event */
+ lStartBufferEvent.ID = sBufferID;
+ lStartBufferEvent.Time = pmTime;
+
+ /* Write event type to tracing buffer */
+ lEventID = TRACE_EV_BUFFER_START;
+ tracer_write_to_buffer(lWritPos,
+ &lEventID,
+ sizeof(lEventID));
+
+ /* Write event time delta to tracing buffer */
+ lTimeDelta = 0;
+ tracer_write_to_buffer(lWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+
+ /* Write event structure */
+ tracer_write_to_buffer(lWritPos,
+ &lStartBufferEvent,
+ sizeof(lStartBufferEvent));
+
+ /* Compute the data size */
+ lDataSize = sizeof(lEventID)
+ + sizeof(lTimeDelta)
+ + sizeof(lStartBufferEvent)
+ + sizeof(lDataSize);
+
+ /* Write the length of the event description */
+ tracer_write_to_buffer(lWritPos,
+ &lDataSize,
+ sizeof(lDataSize));
+}
+
+/**
+ * write_end_buffer_event: - Write end-buffer event to end of buffer.
+ * @pmIndex: index into the trace buffer
+ * @pmTime: the time of the end-buffer event
+ *
+ * Writes end-buffer event at the end of the buffer specified by the
+ * buffer number contained in pmIndex, at the offset also contained in
+ * pmIndex.
+ */
+static inline void write_end_buffer_event(u32 pmIndex, struct timeval pmTime)
+{
+ u8 lEventID; /* Event ID of last event */
+ u8 lCPUID; /* CPUID of currently runing process */
+ trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */
+ char* lWritPos; /* Current position for writing */
+
+ lWritPos = sTracBuf + pmIndex;
+
+ /* Write the CPUID to the tracing buffer, if required */
+ if (sLogCPUID == TRUE) {
+ lCPUID = smp_processor_id();
+ tracer_write_to_buffer(lWritPos,
+ &lCPUID,
+ sizeof(lCPUID));
+ }
+ /* Write event type to tracing buffer */
+ lEventID = TRACE_EV_BUFFER_END;
+ tracer_write_to_buffer(lWritPos,
+ &lEventID,
+ sizeof(lEventID));
+
+ /* Write event time delta to tracing buffer */
+ lTimeDelta = 0;
+ tracer_write_to_buffer(lWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+}
+
+/**
+ * write_lost_size: - Write lost size to end of buffer contained in index.
+ * @pmIndex: index into the trace buffer
+ * @pmSizeLost: number of bytes lost at the end of buffer
+ *
+ * Writes the value contained in pmSizeLost as the last word in the
+ * the buffer specified by the buffer number contained in pmIndex. The
+ * 'lost size' is the number of bytes that are left unused by the tracing
+ * scheme at the end of a buffer for a variety of reasons.
+ */
+static inline void write_lost_size(u32 pmIndex, u32 pmSizeLost)
+{
+ char* lWritBufEnd; /* End of buffer */
+
+ /* Get end of buffer by clearing offset and adding buffer size */
+ lWritBufEnd = sTracBuf
+ + TRACE_BUFFER_OFFSET_CLEAR(pmIndex, sBufferControl.offset_mask)
+ + TRACE_BUFFER_SIZE(sBufferControl.offset_bits);
+
+ /* Write size lost at the end of the buffer */
+ *((u32 *) (lWritBufEnd - sizeof(pmSizeLost))) = pmSizeLost;
+}
+
+/**
+ * finalize_buffer: - Utility function consolidating end-of-buffer tasks.
+ * @pmEndIndex: index into trace buffer to write the end-buffer event at
+ * @pmSizeLost: number of unused bytes at the end of the buffer
+ * @pmTimestamp: the time of the end-buffer event
+ *
+ * This function must be called from within a lock, because it increments
+ * buffers_produced.
+ */
+static inline void finalize_buffer(u32 pmEndIndex, u32 pmSizeLost, struct timeval *pmTimestamp)
+{
+ /* Write end buffer event as last event in old buffer. */
+ write_end_buffer_event(pmEndIndex, *pmTimestamp);
+
+ /* In any buffer switch, we need to write out the lost size,
+ which can be 0. */
+ write_lost_size(pmEndIndex, pmSizeLost);
+
+ /* Add the size lost and end event size to fill_count so that
+ the old buffer won't be seen as incomplete. */
+ trace_commit(pmEndIndex, pmSizeLost);
+
+ /* Every finalized buffer means a produced buffer */
+ sBufferControl.buffers_produced++;
+}
+
+/**
+ * finalize_lockless_trace: - finalize last buffer at end of trace
+ *
+ * Called when tracing is stopped, to finish processing last buffer.
+ */
+static inline void finalize_lockless_trace(void)
+{
+ u32 lEventsEnd; /* Index of end of last event */
+ u32 lSizeLost; /* Bytes after end of last event */
+ unsigned long int lFlags; /* CPU flags for lock */
+
+ /* Find index of end of last event */
+ lEventsEnd = TRACE_BUFFER_OFFSET_GET(sBufferControl.index, sBufferControl.offset_mask);
+
+ /* Size lost in buffer is the unused space after end of last event
+ and end of buffer. */
+ lSizeLost = TRACE_BUFFER_SIZE(sBufferControl.offset_bits) - lEventsEnd;
+
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* Write end event etc. and increment buffers_produced. The
+ time used here is what the locking version uses as well. */
+ finalize_buffer(sBufferControl.index & sBufferControl.index_mask, lSizeLost, &sBufferStartTime);
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+}
+
+/**
+ * discard_check: - Determine whether an event should be discarded.
+ * @pmOldIndex: index into trace buffer where check for space should begin
+ * @pmLen: the length of the event to check
+ * @pmTimestamp: the time of the end-buffer event
+ *
+ * Checks whether an event of size pmLen will fit into the available
+ * buffer space as indicated by the value in pmOldIndex. A side effect
+ * of this function is that if the length would fill or overflow the
+ * last available buffer, that buffer will be finalized and all
+ * subsequent events will be automatically discarded until a buffer is
+ * later freed.
+ *
+ * The return value contains the result flags and is an ORed combination
+ * of the following:
+ *
+ * LTT_EVENT_DISCARD_NONE - event should not be discarded
+ * LTT_BUFFER_SWITCH - buffer switch occurred
+ * LTT_EVENT_DISCARD - event should be discarded (all buffers are full)
+ * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer
+ */
+static inline int discard_check(u32 pmOldIndex,
+ u32 pmLen,
+ struct timeval *pmTimestamp)
+{
+ u32 lBuffersReady;
+ u32 lOffsetMask = sBufferControl.offset_mask;
+ u8 lOffsetBits = sBufferControl.offset_bits;
+ u32 lIndexMask = sBufferControl.index_mask;
+ u32 lSizeLost;
+ unsigned long int lFlags; /* CPU flags for lock */
+
+ /* Check whether the event is larger than a buffer */
+ if(pmLen >= TRACE_BUFFER_SIZE(sBufferControl.offset_bits))
+ return LTT_EVENT_DISCARD | LTT_EVENT_TOO_LONG;
+
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* We're already overrun, nothing left to do */
+ if(sBuffersFull == TRUE) {
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ return LTT_EVENT_DISCARD;
+ }
+
+ lBuffersReady = sBufferControl.buffers_produced - sBufferControl.buffers_consumed;
+
+ /* If this happens, we've been pushed to the edge of the last
+ available buffer which means we need to finalize it and increment
+ buffers_produced. However, we don't want to allow
+ sBufferControl.index to be actually pushed to full or beyond,
+ otherwise we'd just be wrapping around and allowing subsequent
+ events to overwrite good buffers. It is true that there may not
+ be enough space for this event, but there could be space for
+ subsequent smaller event(s). It doesn't matter if they write
+ themselves, because here we say that anything after the old_index
+ passed in to this function is lost, even if other events have or
+ will reserve space in this last buffer. Nor can any other event
+ reserve space in buffers following this one, until at least one
+ buffer is consumed by the daemon. */
+ if(lBuffersReady == sBufferControl.n_buffers - 1) {
+ /* We set this flag so we only do this once per overrun */
+ sBuffersFull = TRUE;
+
+ /* Get the time of the event */
+ do_gettimeofday(pmTimestamp);
+
+ /* Size lost is everything after old_index */
+ lSizeLost = TRACE_BUFFER_SIZE(lOffsetBits)
+ - TRACE_BUFFER_OFFSET_GET(pmOldIndex, lOffsetMask);
+
+ /* Write end event and lost size. This increases buffer_count
+ by the lost size, which is important later when we add the
+ deferred size. */
+ finalize_buffer(pmOldIndex & lIndexMask, lSizeLost, pmTimestamp);
+
+ /* We need to add the lost size to old index, but we can't
+ do it now, or we'd roll index over and allow new events,
+ so we defer it until a buffer is free. Note however that
+ buffer_count does get incremented by lost size, which is
+ important later when start logging again. */
+ sLastEventIndex = pmOldIndex;
+ sLastEventTimeStamp = *pmTimestamp;
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ /* We lose this event */
+ return LTT_BUFFER_SWITCH | LTT_EVENT_DISCARD;
+ }
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ /* Nothing untoward happened */
+ return LTT_EVENT_DISCARD_NONE;
+}
+
+/**
+ * trace_reserve_slow: - The slow reserve path in the lockless scheme.
+ * @pmOldIndex: the value of the buffer control index when we were called
+ * @pmLen: the length of the slot to reserve
+ * @pmIndex: variable that will receive the start pos of the reserved slot
+ * @pmTimestamp: variable that will receive the time the slot was reserved
+ *
+ * Called by trace_reserve() if the length of the event being logged would
+ * most likely cause a 'buffer switch'. The value of the variable pointed
+ * to by pmIndex will contain the index actually reserved by this
+ * function. The timestamp reflecting the time the slot was reserved
+ * will be saved in *pmTimestamp. The return value indicates whether
+ * there actually was a buffer switch (not inevitable in all cases).
+ * If the return value also indicates a discarded event, the values in
+ * *pmIndex and *pmTimestamp will be indeterminate.
+ *
+ * The return value contains the result flags and is an ORed combination
+ * of the following:
+ *
+ * LTT_BUFFER_SWITCH_NONE - no buffer switch occurred
+ * LTT_EVENT_DISCARD_NONE - event should not be discarded
+ * LTT_BUFFER_SWITCH - buffer switch occurred
+ * LTT_EVENT_DISCARD - event should be discarded (all buffers are full)
+ * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer
+ */
+static inline int trace_reserve_slow(u32 pmOldIndex, /* needed for overruns */
+ u32 pmLen,
+ u32 *pmIndex,
+ struct timeval *pmTimestamp)
+{
+ u32 lNewIndex, lOffset, lNewBufno;
+ unsigned long int lFlags; /* CPU flags for lock */
+ u32 lOffsetMask = sBufferControl.offset_mask;
+ u8 lOffsetBits = sBufferControl.offset_bits;
+ u32 lIndexMask = sBufferControl.index_mask;
+ u32 lSizeLost = sEndReserve; /* size lost always includes end event */
+ int lDiscardEvent;
+ int lBufferSwitched = LTT_BUFFER_SWITCH_NONE;
+
+ /* We don't get here unless the event might cause a buffer switch */
+
+ /* First check whether conditions exist do discard the event */
+ lDiscardEvent = discard_check(pmOldIndex, pmLen, pmTimestamp);
+ if(lDiscardEvent != LTT_EVENT_DISCARD_NONE)
+ return lDiscardEvent;
+
+ /* If we're here, we still have free buffers to reserve from */
+
+ /* Do this until we reserve a spot for the event */
+ do {
+ /* Yeah, we're re-using a param variable, is that bad form? */
+ pmOldIndex = sBufferControl.index;
+
+ /* We're here because the event + ending reserve space would
+ overflow or exactly fill old buffer. Calculate new index
+ again. */
+ lNewIndex = pmOldIndex + pmLen;
+
+ /* We only care about the offset part of the new index */
+ lOffset = TRACE_BUFFER_OFFSET_GET(lNewIndex + sEndReserve, lOffsetMask);
+
+ /* If we would actually overflow and not exactly fill the old
+ buffer, we reserve the first slot (after adding a buffer
+ start event) in the new one. */
+ if((lOffset < pmLen) && (lOffset > 0)) {
+
+ /* This is an overflow, not an exact fit. The
+ reserved index is just after the space reserved for
+ the start event in the new buffer. */
+ *pmIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex + sEndReserve, lOffsetMask)
+ + sStartReserve;
+
+ /* Now the next free space is at the reserved index
+ plus the length of this event. */
+ lNewIndex = *pmIndex + pmLen;
+ } else if (lOffset < pmLen) {
+ /* We'll exactly fill the old buffer, so our reserved
+ index is still in the old buffer and our new index
+ is in the new one + sStartReserve */
+ *pmIndex = pmOldIndex;
+ lNewIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex + sEndReserve, lOffsetMask)
+ + sStartReserve;
+ } else
+ /* another event has actually pushed us into a new
+ buffer since we were called. */
+ *pmIndex = pmOldIndex;
+
+ /* Get the time of the event */
+ do_gettimeofday(pmTimestamp);
+ } while (!compare_and_store_volatile(&sBufferControl.index,
+ pmOldIndex, lNewIndex));
+
+ /* Once we're successful in saving a new_index as the authoritative
+ new global buffer control index, finish the buffer switch
+ processing. */
+
+ /* Mask off the high bits outside of our reserved index */
+ *pmIndex &= lIndexMask;
+
+ /* At this point, our indices are set in stone, so we can safely
+ write our start and end events and lost count to our buffers.
+ The first test here could fail if between the time reserve_slow
+ was called and we got a reserved slot, we slept and someone else
+ did the buffer switch already. */
+ if(lOffset < pmLen) { /* Event caused a buffer switch. */
+ if(lOffset > 0) /* We didn't exactly fill the old buffer */
+ /* Set the size lost value in the old buffer. That
+ value is len+sEndReserve-offset-sEndReserve,
+ i.e. sEndReserve cancels itself out. */
+ lSizeLost += pmLen - lOffset;
+ else /* We exactly filled the old buffer */
+ /* Since we exactly filled the old buffer, the index
+ we write the end event to is after the space
+ reserved for this event. */
+ pmOldIndex += pmLen;
+
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* Write end event etc. and increment buffers_produced. */
+ finalize_buffer(pmOldIndex & lIndexMask, lSizeLost, pmTimestamp);
+
+ /* If we're here, we had a normal buffer switch and need to
+ update the start buffer time before writing the event.
+ The start buffer time is the same as the event time for the
+ event reserved, and lTimeDelta of 0 but that also appears
+ to be the case in the locking version as well. */
+ sBufferStartTime = *pmTimestamp;
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ /* new_index is always valid here, since it's set correctly
+ if offset < len + sEndReserve, and we don't get here
+ unless that's true. The issue would be that if we didn't
+ actually switch buffers, new_index would be too large by
+ sEndReserve bytes. */
+ write_start_buffer_event(lNewIndex & lIndexMask, *pmTimestamp);
+
+ /* We initialize the new buffer by subtracting
+ TRACE_BUFFER_SIZE rather than directly initializing to
+ sStartReserve in case events have been already been added
+ to the new buffer under us. We subtract space for the start
+ buffer event from buffer size to leave room for the start
+ buffer event we just wrote. */
+ lNewBufno = TRACE_BUFFER_NUMBER_GET(lNewIndex & lIndexMask, lOffsetBits);
+ atomic_sub_volatile(&sBufferControl.fill_count[lNewBufno],
+ TRACE_BUFFER_SIZE(lOffsetBits) - sStartReserve);
+
+ /* We need to check whether fill_count is less than the
+ sStartReserve. If this test is true, it means that
+ subtracting the buffer size underflowed fill_count i.e.
+ fill_count represents an incomplete buffer. Any any case,
+ we're completely fubared and don't have any choice but to
+ start the new buffer out fresh. */
+ if(atomic_read(&sBufferControl.fill_count[lNewBufno]) < sStartReserve)
+ atomic_set_volatile(&sBufferControl.fill_count[lNewBufno], sStartReserve);
+
+ /* If we're here, there must have been a buffer switch */
+ lBufferSwitched = LTT_BUFFER_SWITCH;
+ }
+
+ return lBufferSwitched;
+}
+
+/**
+ * trace_reserve: - Reserve a slot in the trace buffer for an event.
+ * @pmLen: the length of the slot to reserve
+ * @pmIndex: variable that will receive the start pos of the reserved slot
+ * @pmTimestamp: variable that will receive the time the slot was reserved
+ *
+ * This is the fast path for reserving space in the trace buffer in the
+ * lockless tracing scheme. If a slot was successfully reserved, the
+ * caller can then at its leisure write data to the reserved space (at
+ * least until the space is reclaimed in an out-of-space situation).
+ *
+ * If the requested length would fill or exceed the current buffer, the
+ * slow path, trace_reserve_slow(), will be executed instead.
+ *
+ * The index reflecting the start position of the slot reserved will be
+ * saved in *pmIndex, and the timestamp reflecting the time the slot was
+ * reserved will be saved in *pmTimestamp. If the return value indicates
+ * a discarded event, the values in *pmIndex and *pmTimestamp will be
+ * indeterminate.
+ *
+ * The return value contains the result flags and is an ORed combination
+ * of the following:
+ *
+ * LTT_BUFFER_SWITCH_NONE - no buffer switch occurred
+ * LTT_EVENT_DISCARD_NONE - event should not be discarded
+ * LTT_BUFFER_SWITCH - buffer switch occurred
+ * LTT_EVENT_DISCARD - event should be discarded (all buffers are full)
+ * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer
+ */
+static inline int trace_reserve(u32 pmLen,
+ u32 *pmIndex,
+ struct timeval *pmTimestamp)
+{
+ u32 lOldIndex, lNewIndex, lOffset;
+ u32 lOffsetMask = sBufferControl.offset_mask;
+
+ /* Do this until we reserve a spot for the event */
+ do {
+ lOldIndex = sBufferControl.index;
+
+ /* If adding len + sEndReserve to the old index doesn't put us
+ into a new buffer, this is what the new index would be. */
+ lNewIndex = lOldIndex + pmLen;
+ lOffset = TRACE_BUFFER_OFFSET_GET(lNewIndex + sEndReserve, lOffsetMask);
+
+ /* If adding the length reserved for the end buffer event and
+ lost count to the new index would put us into a new buffer,
+ we need to do a buffer switch. If in between now and the
+ buffer switch another event that does fit comes in, no
+ problem because we check again in the slow version. In
+ either case, there will always be room for the end event
+ in the old buffer. The trick in this test is that adding
+ a length that would carry into the non-offset bits of the
+ index results in the offset portion being smaller than the
+ length that was added. */
+ if(lOffset < pmLen)
+ /* We would roll over into a new buffer, need to do
+ buffer switch processing. */
+ return trace_reserve_slow(lOldIndex, pmLen, pmIndex, pmTimestamp);
+
+ /* Get the time of the event */
+ do_gettimeofday(pmTimestamp);
+ } while (!compare_and_store_volatile(&sBufferControl.index,
+ lOldIndex, lNewIndex));
+
+ /* Once we're successful in saving a new_index as the authoritative
+ new global buffer control index, we can return old_index, the
+ successfully reserved index. */
+
+ /* Return the reserved index value */
+ *pmIndex = lOldIndex & sBufferControl.index_mask;
+
+ return LTT_BUFFER_SWITCH_NONE; /* No buffer switch occurred */
+}
+
+/**
+ * lockless_write_event: - Locklessly reserves space and writes an event.
+ * @pmEventID: event id
+ * @pmEventStruct: event details
+ * @pmDataSize: total event size
+ * @pmCPUID: CPU ID associated with event
+ * @pmVarDataBeg: ptr to variable-length data for the event
+ * @pmVarDataLen: length of variable-length data for the event
+ *
+ * This is the main event-writing function for the lockless scheme. It
+ * reserves space for an event if possible, writes the event and signals
+ * the daemon if it caused a buffer switch.
+ */
+int lockless_write_event(u8 pmEventID,
+ void *pmEventStruct,
+ uint16_t pmDataSize,
+ u8 pmCPUID,
+ void *pmVarDataBeg,
+ int pmVarDataLen)
+{
+ u32 lReservedIndex;
+ struct timeval lTime;
+ trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */
+ struct siginfo lSigInfo; /* Signal information */
+ int lReserveRC;
+ char* lWritPos; /* Current position for writing */
+ int lRC = 0;
+
+ /* Reserve space for the event. If the space reserved is in a new
+ buffer, note that fact. */
+ lReserveRC = trace_reserve((u32)pmDataSize,
+ &lReservedIndex, &lTime);
+
+ /* Exact lost event count isn't important to anyone, so this is OK. */
+ if(lReserveRC & LTT_EVENT_DISCARD)
+ sEventsLost++;
+
+ /* We don't write the event, but we still need to signal */
+ if((lReserveRC & LTT_BUFFER_SWITCH) &&
+ (lReserveRC & LTT_EVENT_DISCARD)) {
+ lRC = -ENOMEM;
+ goto send_buffer_switch_signal;
+ }
+
+ /* no buffer space left, discard event. */
+ if((lReserveRC & LTT_EVENT_DISCARD) ||
+ (lReserveRC & LTT_EVENT_TOO_LONG))
+ /* return value for trace() */
+ return -ENOMEM;
+
+ /* The position we write to in the trace memory area is simply the
+ beginning of trace memory plus the index we just reserved. */
+ lWritPos = sTracBuf + lReservedIndex;
+ /* Compute the time delta between this event and the time at which
+ this buffer was started */
+ lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000
+ + (lTime.tv_usec - sBufferStartTime.tv_usec);
+
+ /* Write the CPUID to the tracing buffer, if required */
+ if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START))
+ tracer_write_to_buffer(lWritPos,
+ &pmCPUID,
+ sizeof(pmCPUID));
+
+ /* Write event type to tracing buffer */
+ tracer_write_to_buffer(lWritPos,
+ &pmEventID,
+ sizeof(pmEventID));
+
+ /* Write event time delta to tracing buffer */
+ tracer_write_to_buffer(lWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+
+ /* Do we log event details */
+ if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) {
+ /* Write event structure */
+ tracer_write_to_buffer(lWritPos,
+ pmEventStruct,
+ sEventStructSize[pmEventID]);
+
+ /* Write string if any */
+ if (pmVarDataLen)
+ tracer_write_to_buffer(lWritPos,
+ pmVarDataBeg,
+ pmVarDataLen);
+ }
+ /* Write the length of the event description */
+ tracer_write_to_buffer(lWritPos,
+ &pmDataSize,
+ sizeof(pmDataSize));
+
+ /* We've written the event - update the fill_count for the buffer. */
+ trace_commit(lReservedIndex, (u32)pmDataSize);
+
+send_buffer_switch_signal:
+
+ /* Signal the daemon if we switched buffers */
+ if(lReserveRC & LTT_BUFFER_SWITCH) {
+ /* Setup signal information */
+ lSigInfo.si_signo = SIGIO;
+ lSigInfo.si_errno = 0;
+ lSigInfo.si_code = SI_KERNEL;
+
+#if 0
+ /* DEBUG */
+ printk("<1> Sending SIGIO to %d \n", sDaemonTaskStruct->pid);
+#endif
+ /* Signal the tracing daemon */
+ send_sig_info(SIGIO, &lSigInfo, sDaemonTaskStruct);
+ }
+
+ return lRC;
+}
+
+/**
+ * continue_trace: - Continue a stopped trace.
+ *
+ * Continue a trace that's been temporarily stopped because all buffers
+ * were full.
+ */
+static inline void continue_trace(void)
+{
+ int lDiscardSize;
+ u32 lLastEventBufno;
+ u32 lLastBufferLostSize;
+ u32 lLastEventOffset;
+ u32 lNewIndex;
+
+ /* A buffer's been consumed, and as we've been waiting around at the
+ end of the last one produced, the one after that must now be free */
+ int lFreedBufno = sBufferControl.buffers_produced % sBufferControl.n_buffers;
+
+ /* Start the new buffer out at the beginning */
+ atomic_set_volatile(&sBufferControl.fill_count[lFreedBufno], sStartReserve);
+
+ /* In the all-buffers-full case, sBufferControl.index is frozen at the
+ position of the first event that would have caused a buffer switch.
+ However, the fill_count for that buffer is not frozen and reflects
+ not only the lost size calculated at that point, but also any
+ smaller events that managed to write themselves at the end of the
+ last buffer (because there's technically still space at the end,
+ though it and all those contained events will be erased here).
+ Here we try to salvage if possible that last buffer, but to do
+ that, we need to subtract those pesky smaller events that managed
+ to get in. If after all that, another small event manages to
+ sneak in in the time it takes us to do this, well, we concede and
+ the daemon will toss that buffer. It's not the end of the world
+ if that happens, since that buffer actually marked the start of a
+ bunch of lost events which continues until a buffer is freed. */
+
+ /* Get the bufno and offset of the buffer containing the last event
+ logged before we had to stop for a buffer-full condition. */
+ lLastEventOffset = TRACE_BUFFER_OFFSET_GET(sLastEventIndex, sBufferControl.offset_mask);
+ lLastEventBufno = TRACE_BUFFER_NUMBER_GET(sLastEventIndex, sBufferControl.offset_bits);
+
+ /* We also need to know the lost size we wrote to that buffer when we
+ stopped */
+ lLastBufferLostSize = TRACE_BUFFER_SIZE(sBufferControl.offset_bits) - lLastEventOffset;
+
+ /* Since the time we stopped, some smaller events probably reserved
+ space and wrote themselves in, the sizes of which would have been
+ reflected in the fill_count. The total size of these events is
+ calculated here. */
+ lDiscardSize = atomic_read(&sBufferControl.fill_count[lLastEventBufno])
+ - lLastEventOffset
+ - lLastBufferLostSize;
+
+ /* If there were events written after we stopped, subtract those from
+ the fill_count. If that doesn't fix things, the buffer either is
+ really incomplete, or another event snuck in, and we'll just stop
+ now and say we did what we could for it. */
+ if(lDiscardSize > 0)
+ atomic_sub_volatile(&sBufferControl.fill_count[lLastEventBufno], lDiscardSize);
+
+ /* Since our end buffer event probably got trounced, rewrite it in old
+ buffer. */
+ write_end_buffer_event(sLastEventIndex & sBufferControl.index_mask, sLastEventTimeStamp);
+
+ /* We also need to update the buffer start time and write the start
+ event for the next buffer, since we couldn't do it until now */
+ do_gettimeofday(&sBufferStartTime);
+
+ /* The current buffer control index is hanging around near the end of
+ the last buffer. So we add the buffer size and clear the offset to
+ get to the beginning of the newly freed buffer. */
+ lNewIndex = sBufferControl.index + TRACE_BUFFER_SIZE(sBufferControl.offset_bits);
+ lNewIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex, sBufferControl.offset_mask) + sStartReserve;
+ write_start_buffer_event(lNewIndex & sBufferControl.index_mask, sBufferStartTime);
+
+ /* Fixing up sBufferControl.index is simpler. Since a buffer has been
+ consumed, there's now at least one buffer free, and we can continue.
+ We start off the next buffer in a fresh state. Since nothing else
+ can be meaningfully updating the buffer control index, we can safely
+ do that here. 'Meaningfully' means that there may be cases of
+ smaller events managing to update the index in the last buffer but
+ they're essentially erased by the lost size of that buffer when
+ sBuffersFull was set. We need to restart the index at the beginning
+ of the next available buffer before turning off sBuffersFull, and
+ avoid an erroneous buffer switch. */
+ sBufferControl.index = lNewIndex;
+
+ /* Now we can continue reserving events */
+ sBuffersFull = FALSE;
+}
+
+/**
+ * tracer_set_n_buffers: - Sets the number of buffers.
+ * @pmNBuffers: number of buffers.
+ *
+ * Sets the number of buffers containing the trace data, valid only for
+ * lockless scheme, must be a power of 2.
+ *
+ * Returns:
+ *
+ * 0, Size setting went OK
+ * -EINVAL, not a power of 2
+ */
+int tracer_set_n_buffers(int pmNBuffers)
+{
+ if(hweight32(pmNBuffers) != 1) /* Invalid if # set bits in word != 1 */
+ return -EINVAL;
+
+ /* Find position of one and only set bit */
+ sBufnoBits = ffs(pmNBuffers) - 1;
+
+ return 0;
+}
+#else
+static void init_buffer_control(struct buffer_control * pmBC,
+ int pmUseLockless,
+ u8 pmBufnoBits,
+ u8 pmOffsetBits)
+{
+ pmBC->using_lockless = pmUseLockless;
+}
+static inline void write_start_buffer_event(u32 pmIndex, struct timeval pmTime)
+{
+}
+static inline void finalize_lockless_trace(void)
+{
+}
+static inline void continue_trace(void)
+{
+}
+int tracer_set_n_buffers(int pmNBuffers)
+{
+ return -EINVAL;
+}
+#endif /* CONFIG_LOCKLESS_TRACE */
+
+/**
+ * trace: - Tracing function per se.
+ * @pmEventID: ID of event as defined in linux/trace.h
+ * @pmEventStruct: struct describing the event
+ *
+ * Returns:
+ * 0, if everything went OK (event got registered)
+ * -ENODEV, no tracing daemon opened the driver.
+ * -ENOMEM, no more memory to store events.
+ * -EBUSY, tracer not started yet.
+ *
+ * Note:
+ * The kernel has to be locked here because trace() could be called from
+ * an interrupt handling routine and from process service routine.
+ */
+int trace(u8 pmEventID,
+ void *pmEventStruct)
+{
+ int lVarDataLen = 0; /* Length of variable length data to be copied, if any */
+ void *lVarDataBeg = NULL; /* Begining of variable length data to be copied */
+ int lSendSignal = FALSE; /* Should the daemon be summoned */
+ u8 lCPUID; /* CPUID of currently runing process */
+ uint16_t lDataSize; /* Size of tracing data */
+ struct siginfo lSigInfo; /* Signal information */
+ struct timeval lTime; /* Event time */
+ unsigned long int lFlags; /* CPU flags for lock */
+ trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */
+ struct task_struct *pIncomingProcess = NULL; /* Pointer to incoming process */
+
+ /* Is there a tracing daemon */
+ if (sDaemonTaskStruct == NULL)
+ return -ENODEV;
+
+ /* Is this the exit of a process? */
+ if ((pmEventID == TRACE_EV_PROCESS) &&
+ (pmEventStruct != NULL) &&
+ ((((trace_process *) pmEventStruct)->event_sub_id) == TRACE_EV_PROCESS_EXIT))
+ trace_destroy_owners_events(current->pid);
+
+ /* Do we trace the event */
+ if ((sTracerStarted == TRUE) || (pmEventID == TRACE_EV_START) || (pmEventID == TRACE_EV_BUFFER_START))
+ goto TraceEvent;
+
+ return -EBUSY;
+
+TraceEvent:
+ /* Are we monitoring this event */
+ if (!ltt_test_bit(pmEventID, &sTracedEvents))
+ return 0;
+
+ /* Always let the start event pass, whatever the IDs */
+ if ((pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) {
+ /* Is this a scheduling change */
+ if (pmEventID == TRACE_EV_SCHEDCHANGE) {
+ /* Get pointer to incoming process */
+ pIncomingProcess = (struct task_struct *) (((trace_schedchange *) pmEventStruct)->in);
+
+ /* Set PID information in schedchange event */
+ (((trace_schedchange *) pmEventStruct)->in) = pIncomingProcess->pid;
+ }
+ /* Are we monitoring a particular process */
+ if ((sTracingPID == TRUE) && (current->pid != sTracedPID)) {
+ /* Record this event if it is the scheduling change bringing in the traced PID */
+ if (pIncomingProcess == NULL)
+ return 0;
+ else if (pIncomingProcess->pid != sTracedPID)
+ return 0;
+ }
+ /* Are we monitoring a particular process group */
+ if ((sTracingPGRP == TRUE) && (current->pgrp != sTracedPGRP)) {
+ /* Record this event if it is the scheduling change bringing in a process of the traced PGRP */
+ if (pIncomingProcess == NULL)
+ return 0;
+ else if (pIncomingProcess->pgrp != sTracedPGRP)
+ return 0;
+ }
+ /* Are we monitoring the processes of a given group of users */
+ if ((sTracingGID == TRUE) && (current->egid != sTracedGID)) {
+ /* Record this event if it is the scheduling change bringing in a process of the traced GID */
+ if (pIncomingProcess == NULL)
+ return 0;
+ else if (pIncomingProcess->egid != sTracedGID)
+ return 0;
+ }
+ /* Are we monitoring the processes of a given user */
+ if ((sTracingUID == TRUE) && (current->euid != sTracedUID)) {
+ /* Record this event if it is the scheduling change bringing in a process of the traced UID */
+ if (pIncomingProcess == NULL)
+ return 0;
+ else if (pIncomingProcess->euid != sTracedUID)
+ return 0;
+ }
+ }
+
+ /* Compute size of tracing data */
+ lDataSize = sizeof(pmEventID) + sizeof(lTimeDelta) + sizeof(lDataSize);
+
+ /* Do we log the event details */
+ if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) {
+ /* Update the size of the data entry */
+ lDataSize += sEventStructSize[pmEventID];
+
+ /* Some events have variable length */
+ switch (pmEventID) {
+ /* Is there a file name in this */
+ case TRACE_EV_FILE_SYSTEM:
+ if ((((trace_file_system *) pmEventStruct)->event_sub_id == TRACE_EV_FILE_SYSTEM_EXEC)
+ || (((trace_file_system *) pmEventStruct)->event_sub_id == TRACE_EV_FILE_SYSTEM_OPEN)) {
+ /* Remember the string's begining and update size variables */
+ lVarDataBeg = ((trace_file_system *) pmEventStruct)->file_name;
+ lVarDataLen = ((trace_file_system *) pmEventStruct)->event_data2 + 1;
+ lDataSize += (uint16_t) lVarDataLen;
+ }
+ break;
+
+ /* Logging of a custom event */
+ case TRACE_EV_CUSTOM:
+ lVarDataBeg = ((trace_custom *) pmEventStruct)->data;
+ lVarDataLen = ((trace_custom *) pmEventStruct)->data_size;
+ lDataSize += (uint16_t) lVarDataLen;
+ break;
+ }
+ }
+
+ /* Do we record the CPUID */
+ if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) {
+ /* Remember the CPUID */
+ lCPUID = smp_processor_id();
+
+ /* Update the size of the data entry */
+ lDataSize += sizeof(lCPUID);
+ }
+
+#if CONFIG_LOCKLESS_TRACE
+/* Lock-free event-writing isn't available without cmpxchg */
+#if __HAVE_ARCH_CMPXCHG
+ /* If we're using the lockless scheme, we preempt the default path
+ here - nothing after this point in this function will be executed.
+ Note that even if we do have cmpxchg, we still want to have a
+ choice between the lock-free and locking schemes at run-time, thus
+ the using_lockless check. This used to be implemented as a kernel
+ hook, and will be again when/if kernel hooks are accepted into the
+ kernel. */
+ if(sBufferControl.using_lockless)
+ return lockless_write_event(pmEventID,
+ pmEventStruct,
+ lDataSize,
+ lCPUID,
+ lVarDataBeg,
+ lVarDataLen);
+#endif /* __HAVE_ARCH_CMPXCHG */
+#endif /* CONFIG_LOCKLESS_TRACE */
+
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* The following time calculations have to be done within the spinlock because
+ otherwise the event order could be inverted. */
+
+ /* Get the time of the event */
+ do_gettimeofday(&lTime);
+
+ /* Compute the time delta between this event and the time at which this buffer was started */
+ lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000
+ + (lTime.tv_usec - sBufferStartTime.tv_usec);
+
+ /* Is there enough space left in the write buffer */
+ if (sWritPos + lDataSize > sWritLimit) {
+ /* Have we already switched buffers and informed the daemon of it */
+ if (sSignalSent == TRUE) {
+ /* We've lost another event */
+ sEventsLost++;
+
+ /* Bye, bye, now */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+ return -ENOMEM;
+ }
+ /* We need to inform the daemon */
+ lSendSignal = TRUE;
+
+ /* Switch buffers */
+ tracer_switch_buffers(lTime);
+
+ /* Recompute the time delta since sBufferStartTime has changed because of the buffer change */
+ lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000
+ + (lTime.tv_usec - sBufferStartTime.tv_usec);
+ }
+ /* Write the CPUID to the tracing buffer, if required */
+ if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START))
+ tracer_write_to_buffer(sWritPos,
+ &lCPUID,
+ sizeof(lCPUID));
+
+ /* Write event type to tracing buffer */
+ tracer_write_to_buffer(sWritPos,
+ &pmEventID,
+ sizeof(pmEventID));
+
+ /* Write event time delta to tracing buffer */
+ tracer_write_to_buffer(sWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+
+ /* Do we log event details */
+ if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) {
+ /* Write event structure */
+ tracer_write_to_buffer(sWritPos,
+ pmEventStruct,
+ sEventStructSize[pmEventID]);
+
+ /* Write string if any */
+ if (lVarDataLen)
+ tracer_write_to_buffer(sWritPos,
+ lVarDataBeg,
+ lVarDataLen);
+ }
+ /* Write the length of the event description */
+ tracer_write_to_buffer(sWritPos,
+ &lDataSize,
+ sizeof(lDataSize));
+
+ /* Should the tracing daemon be notified */
+ if (lSendSignal == TRUE) {
+ /* Remember that a signal has been sent */
+ sSignalSent = TRUE;
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ /* Setup signal information */
+ lSigInfo.si_signo = SIGIO;
+ lSigInfo.si_errno = 0;
+ lSigInfo.si_code = SI_KERNEL;
+
+ /* DEBUG */
+#if 0
+ printk("<1> Sending SIGIO to %d \n", sDaemonTaskStruct->pid);
+#endif
+
+ /* Signal the tracing daemon */
+ send_sig_info(SIGIO, &lSigInfo, sDaemonTaskStruct);
+ } else
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ return 0;
+}
+
+/**
+ * tracer_switch_buffers: - Switches between read and write buffers.
+ * @pmTime: current time.
+ *
+ * Put the current write buffer to be read and reset put the old read
+ * buffer to be written to. Set the tracer variables in consequence.
+ *
+ * No return values.
+ *
+ * This should be called from within a spin_lock.
+ */
+void tracer_switch_buffers(struct timeval pmTime)
+{
+ char *lTempBuf; /* Temporary buffer pointer */
+ char *lTempBufEnd; /* Temporary buffer end pointer */
+ char *lInitWritPos; /* Initial write position */
+ u8 lEventID; /* Event ID of last event */
+ u8 lCPUID; /* CPUID of currently runing process */
+ uint16_t lDataSize; /* Size of tracing data */
+ u32 lSizeLost; /* Size delta between last event and end of buffer */
+ trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */
+ trace_buffer_start lStartBufferEvent; /* Start of the new buffer event */
+
+ /* Remember initial write position */
+ lInitWritPos = sWritPos;
+
+ /* Write the end event at the write of the buffer */
+
+ /* Write the CPUID to the tracing buffer, if required */
+ if (sLogCPUID == TRUE) {
+ lCPUID = smp_processor_id();
+ tracer_write_to_buffer(sWritPos,
+ &lCPUID,
+ sizeof(lCPUID));
+ }
+ /* Write event type to tracing buffer */
+ lEventID = TRACE_EV_BUFFER_END;
+ tracer_write_to_buffer(sWritPos,
+ &lEventID,
+ sizeof(lEventID));
+
+ /* Write event time delta to tracing buffer */
+ lTimeDelta = 0;
+ tracer_write_to_buffer(sWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+
+ /* Get size lost */
+ lSizeLost = sWritBufEnd - lInitWritPos;
+
+ /* Write size lost at the end of the buffer */
+ *((u32 *) (sWritBufEnd - sizeof(lSizeLost))) = lSizeLost;
+
+ /* Switch buffers */
+ lTempBuf = sReadBuf;
+ sReadBuf = sWritBuf;
+ sWritBuf = lTempBuf;
+
+ /* Set buffer ends */
+ lTempBufEnd = sReadBufEnd;
+ sReadBufEnd = sWritBufEnd;
+ sWritBufEnd = lTempBufEnd;
+
+ /* Set read limit */
+ sReadLimit = sReadBufEnd;
+
+ /* Set write limit */
+ sWritLimit = sWritBufEnd - TRACER_LAST_EVENT_SIZE;
+
+ /* Set write position */
+ sWritPos = sWritBuf;
+
+ /* Increment buffer ID */
+ sBufferID++;
+
+ /* Set the time of begining of this buffer */
+ sBufferStartTime = pmTime;
+
+ /* Write the start of buffer event */
+ lStartBufferEvent.ID = sBufferID;
+ lStartBufferEvent.Time = pmTime;
+
+ /* Write event type to tracing buffer */
+ lEventID = TRACE_EV_BUFFER_START;
+ tracer_write_to_buffer(sWritPos,
+ &lEventID,
+ sizeof(lEventID));
+
+ /* Write event time delta to tracing buffer */
+ lTimeDelta = 0;
+ tracer_write_to_buffer(sWritPos,
+ &lTimeDelta,
+ sizeof(lTimeDelta));
+
+ /* Write event structure */
+ tracer_write_to_buffer(sWritPos,
+ &lStartBufferEvent,
+ sizeof(lStartBufferEvent));
+
+ /* Compute the data size */
+ lDataSize = sizeof(lEventID)
+ + sizeof(lTimeDelta)
+ + sizeof(lStartBufferEvent)
+ + sizeof(lDataSize);
+
+ /* Write the length of the event description */
+ tracer_write_to_buffer(sWritPos,
+ &lDataSize,
+ sizeof(lDataSize));
+}
+
+/**
+ * tracer_ioctl: - "Ioctl" file op
+ *
+ * @pmInode: the inode associated with the device
+ * @pmFile: file structure given to the acting process
+ * @pmCmd: command given by the caller
+ * @pmArg: arguments to the command
+ *
+ * Returns:
+ * >0, In case the caller requested the number of events lost.
+ * 0, Everything went OK
+ * -ENOSYS, no such command
+ * -EINVAL, tracer not properly configured
+ * -EBUSY, tracer can't be reconfigured while in operation
+ * -ENOMEM, no more memory
+ * -EFAULT, unable to access user space memory
+ *
+ * Note:
+ * In the future, this function should check to make sure that it's the
+ * server that make thes ioctl.
+ */
+int tracer_ioctl(struct inode *pmInode,
+ struct file *pmFile,
+ unsigned int pmCmd,
+ unsigned long pmArg)
+{
+ int lRetValue; /* Function return value */
+ int lDevMinor; /* Device minor number */
+ int lNewUserEventID; /* ID of newly created user event */
+ trace_start lStartEvent; /* Event marking the begining of the trace */
+ unsigned long int lFlags; /* CPU flags for lock */
+ trace_custom lUserEvent; /* The user event to be logged */
+ trace_change_mask lTraceMask; /* Event mask */
+ trace_new_event lNewUserEvent; /* The event to be created for the user */
+ trace_buffer_start lStartBufferEvent; /* Start of the new buffer event */
+
+ /* Get device's minor number */
+ lDevMinor = minor(pmInode->i_rdev) & 0x0f;
+
+ /* If the tracer is started, the daemon can't modify the configuration */
+ if ((lDevMinor == 0)
+ && (sTracerStarted == TRUE)
+ && (pmCmd != TRACER_STOP)
+ && (pmCmd != TRACER_DATA_COMITTED)
+ && (pmCmd != TRACER_GET_BUFFER_CONTROL))
+ return -EBUSY;
+
+ /* Only some operations are permitted to user processes trying to log events */
+ if ((lDevMinor == 1)
+ && (pmCmd != TRACER_CREATE_USER_EVENT)
+ && (pmCmd != TRACER_DESTROY_USER_EVENT)
+ && (pmCmd != TRACER_TRACE_USER_EVENT)
+ && (pmCmd != TRACER_SET_EVENT_MASK)
+ && (pmCmd != TRACER_GET_EVENT_MASK))
+ return -ENOSYS;
+
+ /* Depending on the command executed */
+ switch (pmCmd) {
+ /* Start the tracer */
+ case TRACER_START:
+ /* Initialize buffer control regardless of scheme in use */
+ init_buffer_control(&sBufferControl,
+ !sUseLocking, /* using_lockless */
+ sBufnoBits, /* bufno_bits, 2**n */
+ sBufOffsetBits); /* offset_bits, 2**n */
+
+ /* Check if the device has been properly set up */
+ if (((sUseSyscallEIPBounds == TRUE)
+ && (sSyscallEIPDepthSet == TRUE))
+ || ((sUseSyscallEIPBounds == TRUE)
+ && ((sLowerEIPBoundSet != TRUE)
+ || (sUpperEIPBoundSet != TRUE)))
+ || ((sTracingPID == TRUE)
+ && (sTracingPGRP == TRUE)))
+ return -EINVAL;
+
+ /* Set the kernel-side trace configuration */
+ if (trace_set_config(trace,
+ sSyscallEIPDepthSet,
+ sUseSyscallEIPBounds,
+ sSyscallEIPDepth,
+ sLowerEIPBound,
+ sUpperEIPBound) < 0)
+ return -EINVAL;
+
+ /* Always log the start event and the buffer start event */
+ ltt_set_bit(TRACE_EV_BUFFER_START, &sTracedEvents);
+ ltt_set_bit(TRACE_EV_BUFFER_START, &sLogEventDetailsMask);
+ ltt_set_bit(TRACE_EV_START, &sTracedEvents);
+ ltt_set_bit(TRACE_EV_START, &sLogEventDetailsMask);
+ ltt_set_bit(TRACE_EV_CHANGE_MASK, &sTracedEvents);
+ ltt_set_bit(TRACE_EV_CHANGE_MASK, &sLogEventDetailsMask);
+
+ /* Get the time of start */
+ do_gettimeofday(&sBufferStartTime);
+
+ /* Set the event description */
+ lStartBufferEvent.ID = sBufferID;
+ lStartBufferEvent.Time = sBufferStartTime;
+
+ /* Set the event description */
+ lStartEvent.MagicNumber = TRACER_MAGIC_NUMBER;
+ lStartEvent.ArchType = TRACE_ARCH_TYPE;
+ lStartEvent.ArchVariant = TRACE_ARCH_VARIANT;
+ lStartEvent.SystemType = TRACE_SYS_TYPE_VANILLA_LINUX;
+ lStartEvent.MajorVersion = TRACER_VERSION_MAJOR;
+ lStartEvent.MinorVersion = TRACER_VERSION_MINOR;
+ lStartEvent.BufferSize = sBufSize;
+ lStartEvent.EventMask = sTracedEvents;
+ lStartEvent.DetailsMask = sLogEventDetailsMask;
+ lStartEvent.LogCPUID = sLogCPUID;
+
+ /* Trace the buffer start event using the appropriate method depending on the locking scheme */
+ if(sBufferControl.using_lockless == TRUE)
+ write_start_buffer_event(sBufferControl.index & sBufferControl.index_mask,
+ sBufferStartTime);
+ else
+ trace(TRACE_EV_BUFFER_START, &lStartBufferEvent);
+
+ /* Trace the start event */
+ trace(TRACE_EV_START, &lStartEvent);
+
+ /* Start tapping into Linux's syscall flow */
+ syscall_entry_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_ENTRY, &sTracedEvents);
+ syscall_exit_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_EXIT, &sTracedEvents);
+
+ /* We can start tracing */
+ sTracerStarted = TRUE;
+
+ /* Reregister custom trace events created earlier */
+ trace_reregister_custom_events();
+ break;
+
+ /* Stop the tracer */
+ case TRACER_STOP:
+ /* Stop tracing */
+ /* We don't log new events, but old lockless ones can finish */
+ sTracerStarted = FALSE;
+
+ /* Stop interrupting the normal flow of system calls */
+ syscall_entry_trace_active = 0;
+ syscall_exit_trace_active = 0;
+
+ /* Make sure the last buffer touched is finalized */
+ if(sBufferControl.using_lockless) {
+ /* Write end buffer event as last event in old buf. */
+ finalize_lockless_trace();
+ break;
+ } /* Else locking scheme */
+
+ /* Acquire the lock to avoid SMP case of where another CPU is writing a trace
+ while buffer is being switched */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* Switch the buffers to ensure that the end of the buffer mark is set (time isn't important) */
+ tracer_switch_buffers(sBufferStartTime);
+
+ /* Release lock */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+ break;
+
+ /* Set the tracer to the default configuration */
+ case TRACER_CONFIG_DEFAULT:
+ tracer_set_default_config();
+ break;
+
+ /* Set the memory buffers the daemon wants us to use */
+ case TRACER_CONFIG_MEMORY_BUFFERS:
+ /* Is the given size "reasonable" */
+ if (sUseLocking == TRUE) {
+ if (pmArg < TRACER_MIN_BUF_SIZE)
+ return -EINVAL;
+ } else {
+ if ((pmArg < TRACER_LOCKLESS_MIN_BUF_SIZE) ||
+ (pmArg > TRACER_LOCKLESS_MAX_BUF_SIZE))
+ return -EINVAL;
+ }
+
+ /* Set the buffer's size */
+ return tracer_set_buffer_size(pmArg);
+ break;
+
+ /* Set the number of memory buffers the daemon wants us to use */
+ case TRACER_CONFIG_N_MEMORY_BUFFERS:
+ /* Is the given size "reasonable" */
+ if ((sUseLocking == TRUE) || (pmArg < TRACER_MIN_BUFFERS) ||
+ (pmArg > TRACER_MAX_BUFFERS))
+ return -EINVAL;
+
+ /* Set the number of buffers */
+ return tracer_set_n_buffers(pmArg);
+ break;
+
+ /* Set locking scheme the daemon wants us to use */
+ case TRACER_CONFIG_USE_LOCKING:
+ /* Set the locking scheme in a global for later */
+ sUseLocking = pmArg;
+#if !(CONFIG_LOCKLESS_TRACE && __HAVE_ARCH_CMPXCHG)
+ if(sUseLocking == FALSE) /* Trying to use lock-free scheme */
+ /* Lock-free scheme not supported on this platform */
+ return -EINVAL;
+#endif
+ break;
+
+ /* Trace the given events */
+ case TRACER_CONFIG_EVENTS:
+ if (copy_from_user(&sTracedEvents, (void *) pmArg, sizeof(sTracedEvents)))
+ return -EFAULT;
+ break;
+
+ /* Record the details of the event, or not */
+ case TRACER_CONFIG_DETAILS:
+ if (copy_from_user(&sLogEventDetailsMask, (void *) pmArg, sizeof(sLogEventDetailsMask)))
+ return -EFAULT;
+ break;
+
+ /* Record the CPUID associated with the event */
+ case TRACER_CONFIG_CPUID:
+ sLogCPUID = TRUE;
+ break;
+
+ /* Trace only one process */
+ case TRACER_CONFIG_PID:
+ sTracingPID = TRUE;
+ sTracedPID = pmArg;
+ break;
+
+ /* Trace only the given process group */
+ case TRACER_CONFIG_PGRP:
+ sTracingPGRP = TRUE;
+ sTracedPGRP = pmArg;
+ break;
+
+ /* Trace the processes of a given group of users */
+ case TRACER_CONFIG_GID:
+ sTracingGID = TRUE;
+ sTracedGID = pmArg;
+ break;
+
+ /* Trace the processes of a given user */
+ case TRACER_CONFIG_UID:
+ sTracingUID = TRUE;
+ sTracedUID = pmArg;
+ break;
+
+ /* Set the call depth a which the EIP should be fetched on syscall */
+ case TRACER_CONFIG_SYSCALL_EIP_DEPTH:
+ sSyscallEIPDepthSet = TRUE;
+ sSyscallEIPDepth = pmArg;
+ break;
+
+ /* Set the lowerbound address from which EIP is recorded on syscall */
+ case TRACER_CONFIG_SYSCALL_EIP_LOWER:
+ /* We are using bounds for fetching the EIP where syscall was made */
+ sUseSyscallEIPBounds = TRUE;
+
+ /* Set the lower bound */
+ sLowerEIPBound = (void *) pmArg;
+
+ /* The lower bound has been set */
+ sLowerEIPBoundSet = TRUE;
+ break;
+
+ /* Set the upperbound address from which EIP is recorded on syscall */
+ case TRACER_CONFIG_SYSCALL_EIP_UPPER:
+ /* We are using bounds for fetching the EIP where syscall was made */
+ sUseSyscallEIPBounds = TRUE;
+
+ /* Set the upper bound */
+ sUpperEIPBound = (void *) pmArg;
+
+ /* The upper bound has been set */
+ sUpperEIPBoundSet = TRUE;
+ break;
+
+ /* The daemon has comitted the last trace */
+ case TRACER_DATA_COMITTED:
+#if 0
+ /* DEBUG */
+ printk("Tracer: Data has been committed \n");
+#endif
+
+ /* The lockless version doesn't use sSignalSent. pmArg is the
+ number of buffers the daemon has told us it just consumed.
+ Add that to the global count. */
+ if(sBufferControl.using_lockless) {
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* We consumed some buffers, note it. */
+ sBufferControl.buffers_consumed += (u32)pmArg;
+
+ /* If we were full, we no longer are */
+ if(sBuffersFull && ((u32)pmArg > 0))
+ continue_trace();
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+ break;
+ } /* Else locking version below */
+
+ /* Safely set the signal sent flag to FALSE */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+ sSignalSent = FALSE;
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+ break;
+
+ /* Get the number of events lost */
+ case TRACER_GET_EVENTS_LOST:
+ return sEventsLost;
+ break;
+
+ /* Create a user event */
+ case TRACER_CREATE_USER_EVENT:
+ /* Copy the information from user space */
+ if (copy_from_user(&lNewUserEvent, (void *) pmArg, sizeof(lNewUserEvent)))
+ return -EFAULT;
+
+ /* Create the event */
+ lNewUserEventID = trace_create_owned_event(lNewUserEvent.type,
+ lNewUserEvent.desc,
+ lNewUserEvent.format_type,
+ lNewUserEvent.form,
+ current->pid);
+
+ /* Has the operation succeded */
+ if (lNewUserEventID >= 0) {
+ /* Set the event ID */
+ lNewUserEvent.id = lNewUserEventID;
+
+ /* Copy the event information back to user space */
+ if (copy_to_user((void *) pmArg, &lNewUserEvent, sizeof(lNewUserEvent))) {
+ /* Since we were unable to tell the user about the event, destroy it */
+ trace_destroy_event(lNewUserEventID);
+ return -EFAULT;
+ }
+ } else
+ /* Forward trace_create_event()'s error code */
+ return lNewUserEventID;
+ break;
+
+ /* Destroy a user event */
+ case TRACER_DESTROY_USER_EVENT:
+ /* Pass on the user's request */
+ trace_destroy_event((int) pmArg);
+ break;
+
+ /* Trace a user event */
+ case TRACER_TRACE_USER_EVENT:
+ /* Copy the information from user space */
+ if (copy_from_user(&lUserEvent, (void *) pmArg, sizeof(lUserEvent)))
+ return -EFAULT;
+
+ /* Copy the user event data */
+ if (copy_from_user(sUserEventData, lUserEvent.data, lUserEvent.data_size))
+ return -EFAULT;
+
+ /* Log the raw event */
+ lRetValue = trace_raw_event(lUserEvent.id,
+ lUserEvent.data_size,
+ sUserEventData);
+
+ /* Has the operation failed */
+ if (lRetValue < 0)
+ /* Forward trace_create_event()'s error code */
+ return lRetValue;
+ break;
+
+ /* Set event mask */
+ case TRACER_SET_EVENT_MASK:
+ /* Copy the information from user space */
+ if (copy_from_user(&(lTraceMask.mask), (void *) pmArg, sizeof(lTraceMask.mask)))
+ return -EFAULT;
+
+ /* Trace the event */
+ lRetValue = trace(TRACE_EV_CHANGE_MASK, &lTraceMask);
+
+ /* Change the event mask. (This has to be done second or else may loose the
+ information if the user decides to stop logging "change mask" events) */
+ memcpy(&sTracedEvents, &(lTraceMask.mask), sizeof(lTraceMask.mask));
+ syscall_entry_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_ENTRY, &sTracedEvents);
+ syscall_exit_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_EXIT, &sTracedEvents);
+
+ /* Always trace the buffer start, the trace start and the change mask */
+ ltt_set_bit(TRACE_EV_BUFFER_START, &sTracedEvents);
+ ltt_set_bit(TRACE_EV_START, &sTracedEvents);
+ ltt_set_bit(TRACE_EV_CHANGE_MASK, &sTracedEvents);
+
+ /* Forward trace()'s error code */
+ return lRetValue;
+ break;
+
+ /* Get event mask */
+ case TRACER_GET_EVENT_MASK:
+ /* Copy the information to user space */
+ if (copy_to_user((void *) pmArg, &sTracedEvents, sizeof(sTracedEvents)))
+ return -EFAULT;
+ break;
+
+ /* Get buffer control data */
+ case TRACER_GET_BUFFER_CONTROL:
+ /* We can't copy_to_user() with a lock held (accessing user
+ memory may cause a page fault), so buffers_produced may
+ actually be larger than what the daemon sees when this
+ snapshot is taken. This isn't a problem because the
+ daemon will get a chance to read the new buffer the next
+ time it's signaled. */
+ /* Copy the buffer control information to user space */
+ if(copy_to_user((void *) pmArg, &sBufferControl, sizeof(sBufferControl)))
+ return -EFAULT;
+ break;
+
+ /* Unknown command */
+ default:
+ return -ENOSYS;
+ }
+
+ return 0;
+}
+
+/**
+ * tracer_mmap: - "Mmap" file op
+ * @pmInode: the inode associated with the device
+ * @pmFile: file structure given to the acting process
+ * @pmVmArea: Virtual memory area description structure
+ *
+ * Returns:
+ * 0 if ok
+ * -EAGAIN, when remap failed
+ * -EACCESS, permission denied
+ */
+int tracer_mmap(struct file *pmFile,
+ struct vm_area_struct *pmVmArea)
+{
+ int lRetValue; /* Function's return value */
+
+ /* Only the trace daemon is allowed access to mmap */
+ if (current != sDaemonTaskStruct)
+ return -EACCES;
+
+ /* Remap trace buffer into the process's memory space */
+ lRetValue = tracer_mmap_region(pmVmArea,
+ (char *) pmVmArea->vm_start,
+ sTracBuf,
+ pmVmArea->vm_end - pmVmArea->vm_start);
+
+#if 0
+ printk("Tracer: Trace buffer virtual address => 0x%08X \n", (u32) sTracBuf);
+ printk("Tracer: Trace buffer physical address => 0x%08X \n", (u32) virt_to_phys(sTracBuf));
+ printk("Tracer: Trace buffer virtual address in daemon space => 0x%08X \n", (u32) pmVmArea->vm_start);
+ printk("Tracer: Trace buffer physical address in daemon space => 0x%08X \n", (u32) virt_to_phys((void *) pmVmArea->vm_start));
+#endif
+
+ return lRetValue;
+}
+
+/**
+ * tracer_open(): - "Open" file op
+ * @pmInode: the inode associated with the device
+ * @pmFile: file structure given to the acting process
+ *
+ * Returns:
+ * 0, everything went OK
+ * -ENODEV, no such device.
+ * -EBUSY, daemon channel (minor number 0) already in use.
+ */
+int tracer_open(struct inode *pmInode,
+ struct file *pmFile)
+{
+ int lDevMinor = minor(pmInode->i_rdev) & 0x0f; /* Device minor number */
+
+ /* Only minor number 0 and 1 are used */
+ if ((lDevMinor > 0) && (lDevMinor != 1))
+ return -ENODEV;
+
+ /* If the device has already been opened */
+ if (sOpenCount) {
+ /* Is there another process trying to open the daemon's channel (minor number 0) */
+ if (lDevMinor == 0)
+ return -EBUSY;
+ else
+ /* Only increment use, this is just another user process trying to log user events */
+ goto IncrementUse;
+ }
+ /* Fetch the task structure of the process that opened the device */
+ sDaemonTaskStruct = current;
+
+ /* Reset the default configuration since this is the daemon and he will complete the setup */
+ tracer_set_default_config();
+
+#if 0
+ /* DEBUG */
+ printk("<1>Process %d opened the tracing device \n", sDaemonTaskStruct->pid);
+#endif
+
+IncrementUse:
+ /* Lock the device */
+ sOpenCount++;
+
+#ifdef MODULE
+ /* Increment module usage */
+ MOD_INC_USE_COUNT;
+#endif
+
+ return 0;
+}
+
+/**
+ * tracer_release: - "Release" file op
+ * @pmInode: the inode associated with the device
+ * @pmFile: file structure given to the acting process
+ *
+ * Returns:
+ * 0, everything went OK
+ * -EBUSY, there are still event writes in progress so the buffer can't
+ * be released.
+ *
+ * Note:
+ * It is assumed that if the tracing daemon dies, exits or simply stops
+ * existing, the kernel or "someone" will call tracer_release. Otherwise,
+ * we're in trouble ...
+ */
+int tracer_release(struct inode *pmInode,
+ struct file *pmFile)
+{
+ int lCount;
+ int lDevMinor = minor(pmInode->i_rdev) & 0x0f; /* Device minor number */
+
+ /* Is this a simple user process exiting? */
+ if (lDevMinor != 0)
+ goto DecrementUse;
+
+ /* Did we loose any events */
+ if (sEventsLost > 0)
+ printk(KERN_ALERT "Tracer: Lost %d events \n", sEventsLost);
+
+ /* Reset the daemon PID */
+ sDaemonTaskStruct = NULL;
+
+ /* Free the current buffers, if any, but only if they're not still
+ in use */
+ if (sTracBuf != NULL) {
+ lCount = trace_get_pending_write_count();
+ if(lCount == 0)
+ rvfree(sTracBuf, sAllocSize);
+ else {
+ printk(KERN_ERR "Tracer: Couldn't release tracer - %d event writes pending \n",
+ lCount);
+ return -EBUSY;
+ }
+ }
+
+ /* Reset the read and write buffers */
+ sTracBuf = NULL;
+ sWritBuf = NULL;
+ sReadBuf = NULL;
+ sWritBufEnd = NULL;
+ sReadBufEnd = NULL;
+ sWritPos = NULL;
+ sReadLimit = NULL;
+ sWritLimit = NULL;
+ sUseLocking = TRUE;
+
+ /* Reset the tracer's configuration */
+ tracer_set_default_config();
+ sTracerStarted = FALSE;
+
+ /* Reset number of bytes recorded and number of events lost */
+ sBufReadComplete = 0;
+ sSizeReadIncomplete = 0;
+ sEventsLost = 0;
+
+ /* Reset signal sent */
+ sSignalSent = FALSE;
+
+DecrementUse:
+ /* Unlock the device */
+ sOpenCount--;
+
+#ifdef MODULE
+ /* Decrement module usage */
+ MOD_DEC_USE_COUNT;
+#endif
+
+ return 0;
+}
+
+/**
+ * tracer_fsync: - "Fsync" file op
+ * @pmFile: file structure given to the acting process
+ * @pmDEntry: dentry associated with file
+ *
+ * Returns:
+ * 0, everything went OK
+ * -EACCESS, permission denied
+ *
+ * Note:
+ * We need to look the modifications of the values because they are read
+ * and written by trace().
+ */
+int tracer_fsync(struct file *pmFile,
+ struct dentry *pmDEntry,
+ int pmDataSync)
+{
+ unsigned long int lFlags;
+
+ /* Only the trace daemon is allowed access to fsync */
+ if (current != sDaemonTaskStruct)
+ return -EACCES;
+
+ /* Lock the kernel */
+ spin_lock_irqsave(&sSpinLock, lFlags);
+
+ /* Reset the write positions */
+ sWritPos = sWritBuf;
+
+ /* Reset read limit */
+ sReadLimit = sReadBuf;
+
+ /* Reset bytes recorded */
+ sBufReadComplete = 0;
+ sSizeReadIncomplete = 0;
+ sEventsLost = 0;
+
+ /* Reset signal sent */
+ sSignalSent = FALSE;
+
+ /* Unlock the kernel */
+ spin_unlock_irqrestore(&sSpinLock, lFlags);
+
+ return 0;
+}
+
+/**
+ * tracer_set_buffer_size: - Sets the size of the buffers.
+ * @pmSize: Size of buffers
+ *
+ * Returns:
+ * 0, Size setting went OK
+ * -ENOMEM, unable to get a hold of memory for tracer
+ *
+ * sBufnoBits must have already been set before this function is called.
+ */
+int tracer_set_buffer_size(int pmSize)
+{
+ int lSizeAlloc;
+ int lNBuffers = TRACE_MAX_BUFFER_NUMBER(sBufnoBits);
+
+ if(sUseLocking == TRUE)
+ /* Set size to allocate (= pmSize * 2) and fix it's size to be on a page boundary */
+ lSizeAlloc = FIX_SIZE(pmSize << 1);
+ else {
+ /* Calculate power-of-2 buffer size */
+ if(hweight32(pmSize) != 1)
+ /* Invalid if # set bits != 1 */
+ return -EINVAL;
+
+ /* Find position of one and only set bit */
+ sBufOffsetBits = ffs(pmSize) - 1;
+
+ /* Calculate total size of buffers */
+ lSizeAlloc = pmSize * lNBuffers;
+
+ /* Sanity check */
+ if(lSizeAlloc > TRACER_LOCKLESS_MAX_TOTAL_BUF_SIZE)
+ return -EINVAL;
+ }
+
+ /* Free the current buffers, if any, but only if they're not still in use */
+ if (sTracBuf != NULL) {
+ if(trace_get_pending_write_count() == 0)
+ rvfree(sTracBuf, sAllocSize);
+ else
+ return -EBUSY;
+ }
+
+ /* Allocate space for the tracing buffers */
+ if ((sTracBuf = (char *) rvmalloc(lSizeAlloc)) == NULL)
+ return -ENOMEM;
+
+#if 0 /* DEBUG - init all of buffer with easy-to-spot default values */
+ {
+ int i;
+ for(i=0; i<lSizeAlloc; i+=4)
+ *((u32 *)(sTracBuf+i)) = 0xcafebabe;
+ }
+#endif
+
+ /* Remember the size set */
+ sBufSize = pmSize;
+ sAllocSize = lSizeAlloc;
+
+ /* Set the read and write buffers */
+ sWritBuf = sTracBuf;
+ sReadBuf = sTracBuf + sBufSize;
+
+ /* Set end of buffers */
+ sWritBufEnd = sWritBuf + sBufSize;
+ sReadBufEnd = sReadBuf + sBufSize;
+
+ /* Set write position */
+ sWritPos = sWritBuf;
+
+ /* Set read limit */
+ sReadLimit = sReadBuf;
+
+ /* Set write limit */
+ sWritLimit = sWritBufEnd - TRACER_LAST_EVENT_SIZE;
+
+ return 0;
+}
+
+/**
+ * tracer_set_default_config: - Sets the tracer in its default config
+ *
+ * Returns:
+ * 0, everything went OK
+ * -ENOMEM, unable to get a hold of memory for tracer
+ */
+int tracer_set_default_config(void)
+{
+ int i;
+ int lError = 0;
+
+ /* Initialize the event mask */
+ sTracedEvents = 0;
+
+ /* Initialize the event mask with all existing events with their details */
+ for (i = 0; i <= TRACE_EV_MAX; i++) {
+ ltt_set_bit(i, &sTracedEvents);
+ ltt_set_bit(i, &sLogEventDetailsMask);
+ }
+
+ /* Do not interfere with Linux's syscall flow until we actually start tracing */
+ syscall_entry_trace_active = 0;
+ syscall_exit_trace_active = 0;
+
+ /* Forget about the CPUID */
+ sLogCPUID = FALSE;
+
+ /* We aren't tracing any PID or GID in particular */
+ sTracingPID = FALSE;
+ sTracingPGRP = FALSE;
+ sTracingGID = FALSE;
+ sTracingUID = FALSE;
+
+ /* We aren't looking for a particular call depth */
+ sSyscallEIPDepthSet = FALSE;
+
+ /* We aren't going to place bounds on syscall EIP fetching */
+ sUseSyscallEIPBounds = FALSE;
+ sLowerEIPBoundSet = FALSE;
+ sUpperEIPBoundSet = FALSE;
+
+ /* Set the kernel trace configuration to it's basics */
+ trace_set_config(trace,
+ sSyscallEIPDepthSet,
+ sUseSyscallEIPBounds,
+ 0,
+ 0,
+ 0);
+
+ return lError;
+}
+
+/**
+ * tracer_init: - Tracer initialization function.
+ *
+ * Returns:
+ * 0, everything went OK
+ * -ENONMEM, incapable of allocating necessary memory
+ * Forwarded error code otherwise
+ */
+int __init tracer_init(void)
+{
+ int lError = 0;
+
+ /* Initialize configuration */
+ if ((lError = tracer_set_default_config()) < 0)
+ return lError;
+
+ /* Initialize open count */
+ sOpenCount = 0;
+
+ /* Initialize tracer lock */
+ sTracLock = 0;
+
+ /* Initialize signal sent */
+ sSignalSent = FALSE;
+
+ /* Initialize bytes read and events lost */
+ sBufReadComplete = 0;
+ sSizeReadIncomplete = 0;
+ sEventsLost = 0;
+
+ /* Initialize buffer ID */
+ sBufferID = 0;
+
+ /* Initialize tracing daemon task structure */
+ sDaemonTaskStruct = NULL;
+
+ /* Allocate memory for large data components */
+ if ((sUserEventData = vmalloc(CUSTOM_EVENT_MAX_SIZE)) < 0)
+ return -ENOMEM;
+
+ /* Initialize spin lock */
+ sSpinLock = SPIN_LOCK_UNLOCKED;
+
+ /* By default, use locking scheme */
+ sUseLocking = TRUE;
+
+ /* Register the tracer as a char device */
+ sMajorNumber = register_chrdev(0, TRACER_NAME, &sTracerFileOps);
+
+ /* Register the tracer with the kernel */
+ if ((lError = register_tracer(trace)) < 0) {
+ /* Tell the user about the problem */
+ printk(KERN_ALERT "Tracer: Unable to register tracer with kernel, tracer disabled \n");
+
+ /* Make sure no one can open this device */
+ sOpenCount = 1;
+ } else
+ printk(KERN_INFO "Tracer: Initialization complete \n");
+
+ return lError;
+}
+
+/* Is this loaded as a module */
+#ifdef MODULE
+/**
+ * cleanup_module: - Cleanup of the tracer.
+ *
+ * No return values.
+ *
+ * Note: The order of the unregesterings is important. First, rule out any
+ * possibility of getting more trace data. Second, rule out any
+ * possibility of being read by the tracing daemon. Last, free the tracing
+ * buffer, but only if it's not still in use - it's better to lose the
+ * memory than crash the system.
+ */
+void tracer_exit(void)
+{
+ int lCount;
+
+ /* Unregister the tracer from the kernel */
+ unregister_tracer(trace);
+
+ /* Unregister the tracer from being a char device */
+ unregister_chrdev(sMajorNumber, TRACER_NAME);
+
+ /* Free the current buffers, if any, but only if they're not still n use */
+ if (sTracBuf != NULL) {
+ lCount = trace_get_pending_write_count();
+ if(lCount == 0)
+ rvfree(sTracBuf, sAllocSize);
+ else
+ printk(KERN_ERR "Tracer: Couldn't exit tracer - %d event writes pending \n",
+ lCount);
+ }
+
+ /* Paranoia */
+ if(trace_get_pending_write_count() == 0)
+ sTracBuf = NULL;
+}
+module_exit(tracer_exit);
+#endif /* #ifdef MODULE */
+
+module_init(tracer_init);
diff -urpN linux-2.5.37/drivers/trace/tracer.h linux-2.5.37-ltt/drivers/trace/tracer.h
--- linux-2.5.37/drivers/trace/tracer.h Wed Dec 31 19:00:00 1969
+++ linux-2.5.37-ltt/drivers/trace/tracer.h Sat Sep 21 17:56:34 2002
@@ -0,0 +1,233 @@
+/*
+ * drivers/trace/tracer.h
+ *
+ * Copyright (C) 1999, 2000, 2001, 2002 Karim Yaghmour ([email protected])
+ * Portions contributed by T. Halloran: (C) Copyright 2002 IBM Poughkeepsie, IBM Corporation
+ *
+ * This contains the necessary definitions the system tracer
+ */
+
+#ifndef _TRACER_H
+#define _TRACER_H
+
+/* Logic values */
+#define FALSE 0
+#define TRUE 1
+
+/* Structure packing within the trace */
+#ifndef LTT_PACKED_STRUCT
+#if LTT_UNPACKED_STRUCTS
+#define LTT_PACKED_STRUCT
+#else /* if LTT_UNPACKED_STRUCTS */
+#define LTT_PACKED_STRUCT __attribute__ ((packed));
+#endif /* if LTT_UNPACKED_STRUCTS */
+#endif /* if LTT_PACKED_STRUCT */
+
+/* Tracer properties */
+#define TRACER_NAME "tracer" /* Name of the device as seen in /proc/devices */
+
+/* Tracer buffer information */
+#define TRACER_DEFAULT_BUF_SIZE 50000 /* Default size of tracing buffer */
+#define TRACER_MIN_BUF_SIZE 1000 /* Minimum size of tracing buffer */
+#define TRACER_MAX_BUF_SIZE 500000 /* Maximum size of tracing buffer */
+#define TRACER_MIN_BUFFERS 2 /* Minimum number of tracing buffers */
+#define TRACER_MAX_BUFFERS 256 /* Maximum number of tracing buffers */
+
+/* Local definitions */
+typedef u32 trace_time_delta; /* The type used to start the time delta between events */
+
+/* Number of bytes reserved for first event */
+#define TRACER_FIRST_EVENT_SIZE (sizeof(u8) + sizeof(trace_time_delta) + sizeof(trace_buffer_start) + sizeof(uint16_t))
+
+/* Number of bytes reserved for last event, including lost size word */
+#define TRACER_LAST_EVENT_SIZE (sizeof(u8) + sizeof(u8) + sizeof(trace_time_delta) + sizeof(u32))
+
+/* System types */
+#define TRACE_SYS_TYPE_VANILLA_LINUX 1 /* Vanilla linux kernel */
+
+/* The information logged when the tracing is started */
+#define TRACER_MAGIC_NUMBER 0x00D6B7ED /* That day marks an important historical event ... */
+#define TRACER_VERSION_MAJOR 1 /* Major version number */
+#define TRACER_VERSION_MINOR 14 /* Minor version number */
+typedef struct _trace_start {
+ u32 MagicNumber; /* Magic number to identify a trace */
+ u32 ArchType; /* Type of architecture */
+ u32 ArchVariant; /* Variant of the given type of architecture */
+ u32 SystemType; /* Operating system type */
+ u8 MajorVersion; /* Major version of trace */
+ u8 MinorVersion; /* Minor version of trace */
+
+ u32 BufferSize; /* Size of buffers */
+ trace_event_mask EventMask; /* The event mask */
+ trace_event_mask DetailsMask; /* Are the event details logged */
+ u8 LogCPUID; /* Is the CPUID logged */
+} LTT_PACKED_STRUCT trace_start;
+
+/* Start and end of trace buffer information */
+typedef struct _trace_buffer_start {
+ struct timeval Time; /* Time stamp of this buffer */
+ u32 ID; /* Unique buffer ID */
+} LTT_PACKED_STRUCT trace_buffer_start;
+
+/* The configurations possible */
+#define TRACER_START TRACER_MAGIC_NUMBER + 0 /* Start tracing events using the current configuration */
+#define TRACER_STOP TRACER_MAGIC_NUMBER + 1 /* Stop tracing */
+#define TRACER_CONFIG_DEFAULT TRACER_MAGIC_NUMBER + 2 /* Set the tracer to the default configuration */
+#define TRACER_CONFIG_MEMORY_BUFFERS TRACER_MAGIC_NUMBER + 3 /* Set the memory buffers the daemon wants us to use */
+#define TRACER_CONFIG_EVENTS TRACER_MAGIC_NUMBER + 4 /* Trace the given events */
+#define TRACER_CONFIG_DETAILS TRACER_MAGIC_NUMBER + 5 /* Record the details of the event, or not */
+#define TRACER_CONFIG_CPUID TRACER_MAGIC_NUMBER + 6 /* Record the CPUID associated with the event */
+#define TRACER_CONFIG_PID TRACER_MAGIC_NUMBER + 7 /* Trace only one process */
+#define TRACER_CONFIG_PGRP TRACER_MAGIC_NUMBER + 8 /* Trace only the given process group */
+#define TRACER_CONFIG_GID TRACER_MAGIC_NUMBER + 9 /* Trace the processes of a given group of users */
+#define TRACER_CONFIG_UID TRACER_MAGIC_NUMBER + 10 /* Trace the processes of a given user */
+#define TRACER_CONFIG_SYSCALL_EIP_DEPTH TRACER_MAGIC_NUMBER + 11 /* Set the call depth at which the EIP should be fetched on syscall */
+#define TRACER_CONFIG_SYSCALL_EIP_LOWER TRACER_MAGIC_NUMBER + 12 /* Set the lowerbound address from which EIP is recorded on syscall */
+#define TRACER_CONFIG_SYSCALL_EIP_UPPER TRACER_MAGIC_NUMBER + 13 /* Set the upperbound address from which EIP is recorded on syscall */
+#define TRACER_DATA_COMITTED TRACER_MAGIC_NUMBER + 14 /* The daemon has comitted the last trace */
+#define TRACER_GET_EVENTS_LOST TRACER_MAGIC_NUMBER + 15 /* Get the number of events lost */
+#define TRACER_CREATE_USER_EVENT TRACER_MAGIC_NUMBER + 16 /* Create a user tracable event */
+#define TRACER_DESTROY_USER_EVENT TRACER_MAGIC_NUMBER + 17 /* Destroy a user tracable event */
+#define TRACER_TRACE_USER_EVENT TRACER_MAGIC_NUMBER + 18 /* Trace a user event */
+#define TRACER_SET_EVENT_MASK TRACER_MAGIC_NUMBER + 19 /* Set the trace event mask */
+#define TRACER_GET_EVENT_MASK TRACER_MAGIC_NUMBER + 20 /* Get the trace event mask */
+#define TRACER_GET_BUFFER_CONTROL TRACER_MAGIC_NUMBER + 21 /* Get the buffer control data for the lockless schem*/
+#define TRACER_CONFIG_N_MEMORY_BUFFERS TRACER_MAGIC_NUMBER + 22 /* Set the number of memory buffers the daemon wants us to use */
+#define TRACER_CONFIG_USE_LOCKING TRACER_MAGIC_NUMBER + 23 /* Set the locking scheme to use */
+
+/* For the lockless scheme:
+
+ A trace index is composed of two parts, a buffer number and a buffer
+ offset. The actual number of buffers allocated is a run-time decision,
+ although it must be a power of two for efficient computation. We define
+ a maximum number of bits for the buffer number, because the fill_count
+ array in buffer_control must have a fixed size. offset_bits must be at
+ least as large as the maximum event size+start/end buffer event size+
+ lost size word (since a buffer must be able to hold an event of maximum
+ size). Making offset_bits larger reduces fragmentation. Making it
+ smaller increases trace responsiveness. */
+
+/* We need at least enough room for the max custom event, and we also need
+ room for the start and end event. We also need it to be a power of 2. */
+#define TRACER_LOCKLESS_MIN_BUF_SIZE CUSTOM_EVENT_MAX_SIZE + 8192 /* 16K */
+/* Because we use atomic_t as the type for fill_counts, which has only 24
+ usable bits, we have 2**24 = 16M max for each buffer. */
+#define TRACER_LOCKLESS_MAX_BUF_SIZE 0x1000000 /* 16M */
+/* Since we multiply n buffers by the buffer size, this provides a sanity
+ check, much less than the 256*16M possible. */
+#define TRACER_LOCKLESS_MAX_TOTAL_BUF_SIZE 0x8000000 /* 128M */
+
+#define TRACE_MAX_BUFFER_NUMBER(bufno_bits) (1UL << (bufno_bits))
+#define TRACE_BUFFER_SIZE(offset_bits) (1UL << (offset_bits))
+#define TRACE_BUFFER_OFFSET_MASK(offset_bits) (TRACE_BUFFER_SIZE(offset_bits) - 1)
+
+#define TRACE_BUFFER_NUMBER_GET(index, offset_bits) ((index) >> (offset_bits))
+#define TRACE_BUFFER_OFFSET_GET(index, mask) ((index) & (mask))
+#define TRACE_BUFFER_OFFSET_CLEAR(index, mask) ((index) & ~(mask))
+
+/* Flags returned by trace_reserve/trace_reserve_slow */
+#define LTT_BUFFER_SWITCH_NONE 0x00
+#define LTT_EVENT_DISCARD_NONE 0x00
+#define LTT_BUFFER_SWITCH 0x01
+#define LTT_EVENT_DISCARD 0x02
+#define LTT_EVENT_TOO_LONG 0x04
+
+/* Structure used for communicating buffer info between tracer and daemon
+ for lock-free tracing. This is a per-buffer (CPU, etc.) data structure. */
+struct buffer_control
+{
+ int using_lockless;
+ u32 index;
+ u8 bufno_bits;
+ u32 n_buffers; /* cached value */
+ u8 offset_bits;
+ u32 offset_mask; /* cached value */
+ u32 index_mask; /* cached value */
+
+ u32 buffers_produced;
+ u32 buffers_consumed;
+#if CONFIG_LOCKLESS_TRACE
+ /* atomic_t has only 24 usable bits, limiting us to 16M buffers */
+ atomic_t fill_count[TRACER_MAX_BUFFERS];
+#endif /* CONFIG_LOCKLESS_TRACE */
+};
+
+/* If cmpxchg isn't defined for the architecture, we don't want to
+ generate a link error - the locking scheme will still be available. */
+#ifndef __HAVE_ARCH_CMPXCHG
+#define cmpxchg(p,o,n) 0
+#endif
+
+extern __inline__ int ltt_set_bit(int nr, void *addr)
+{
+ unsigned char *p = addr;
+ unsigned char mask = 1 << (nr & 7);
+ unsigned char old;
+
+ p += nr >> 3;
+ old = *p;
+ *p |= mask;
+
+ return ((old & mask) != 0);
+}
+
+extern __inline__ int ltt_clear_bit(int nr, void *addr)
+{
+ unsigned char *p = addr;
+ unsigned char mask = 1 << (nr & 7);
+ unsigned char old;
+
+ p += nr >> 3;
+ old = *p;
+ *p &= ~mask;
+
+ return ((old & mask) != 0);
+}
+
+extern __inline__ int ltt_test_bit(int nr, void *addr)
+{
+ unsigned char *p = addr;
+ unsigned char mask = 1 << (nr & 7);
+
+ p += nr >> 3;
+
+ return ((*p & mask) != 0);
+}
+
+/* Function prototypes */
+int trace
+ (u8,
+ void *);
+void tracer_switch_buffers
+ (struct timeval);
+int tracer_ioctl
+ (struct inode *,
+ struct file *,
+ unsigned int,
+ unsigned long);
+int tracer_mmap
+ (struct file *,
+ struct vm_area_struct *);
+int tracer_open
+ (struct inode *,
+ struct file *);
+int tracer_release
+ (struct inode *,
+ struct file *);
+int tracer_fsync
+ (struct file *,
+ struct dentry *,
+ int);
+#ifdef MODULE
+void tracer_exit
+ (void);
+#endif /* #ifdef MODULE */
+int tracer_set_buffer_size
+ (int);
+int tracer_set_n_buffers
+ (int);
+int tracer_set_default_config
+ (void);
+int tracer_init
+ (void);
+#endif /* _TRACER_H */


2002-09-23 18:06:12

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] LTT for 2.5.37 2/9: Trace driver

Hi!

> +/* Driver */
> +static int sMajorNumber; /* Major number of the tracer */
> +static int sOpenCount; /* Number of times device is open */
> +/* Locking */

Why *s*OpenCount? Some creeping infection by hungarian notation?

Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-09-23 18:26:46

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH] LTT for 2.5.37 2/9: Trace driver

On Sun, 22 Sep 2002, Pavel Machek wrote:

> Hi!
>
> > +/* Driver */
> > +static int sMajorNumber; /* Major number of the tracer */
> > +static int sOpenCount; /* Number of times device is open */
> > +/* Locking */
>
> Why *s*OpenCount? Some creeping infection by hungarian notation?
>

int YesItLooksAsThoughSomeOfThisHasCreptIntoTheKernelAsWell = TRUE;

Methinks it started in a BusLogic driver and wasn't stamped out
before it spread and infected everything else!

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-09-23 18:45:44

by Karim Yaghmour

[permalink] [raw]
Subject: Re: [PATCH] LTT for 2.5.37 2/9: Trace driver


Pavel Machek wrote:
> Hi!
>
> > +/* Driver */
> > +static int sMajorNumber; /* Major number of the tracer */
> > +static int sOpenCount; /* Number of times device is open */
> > +/* Locking */
>
> Why *s*OpenCount? Some creeping infection by hungarian notation?

"static" as opposed 'l' for "local" elsewhere in the code, but you're looking
at code that is going to change to fit the usual kernel coding style. So all
those S's and L's will go away ;)

Karim

===================================================
Karim Yaghmour
[email protected]
Embedded and Real-Time Linux Expert
===================================================