Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Sun, 22 Sep 2002 01:36:38 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Sun, 22 Sep 2002 01:36:38 -0400 Received: from nameservices.net ([208.234.25.16]:33991 "EHLO opersys.com") by vger.kernel.org with ESMTP id ; Sun, 22 Sep 2002 01:35:30 -0400 Message-ID: <3D8D588B.40F0D5FD@opersys.com> Date: Sun, 22 Sep 2002 01:43:39 -0400 From: Karim Yaghmour Reply-To: karim@opersys.com X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.4.19 i686) X-Accept-Language: en, French/Canada, French/France, fr-FR, fr-CA MIME-Version: 1.0 To: linux-kernel , LTT-Dev , Linus Torvalds Subject: [PATCH] LTT for 2.5.38 2/9: Trace driver Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 96336 Lines: 2751 This is the LTT trace driver itself. Here are the file modifications: drivers/Makefile | 1 drivers/trace/Config.help | 48 drivers/trace/Config.in | 8 drivers/trace/Makefile | 17 drivers/trace/tracer.c | 2398 ++++++++++++++++++++++++++++++++++++++++++++++ drivers/trace/tracer.h | 233 ++++ 6 files changed, 2705 insertions diff -urpN linux-2.5.38/drivers/Makefile linux-2.5.38-ltt/drivers/Makefile --- linux-2.5.38/drivers/Makefile Sun Sep 22 00:25:17 2002 +++ linux-2.5.38-ltt/drivers/Makefile Sun Sep 22 00:51:51 2002 @@ -41,5 +41,6 @@ obj-$(CONFIG_MD) += md/ obj-$(CONFIG_BLUEZ) += bluetooth/ obj-$(CONFIG_HOTPLUG_PCI) += hotplug/ obj-$(CONFIG_ISDN_BOOL) += isdn/ +obj-$(CONFIG_TRACE) += trace/ include $(TOPDIR)/Rules.make diff -urpN linux-2.5.38/drivers/trace/Config.help linux-2.5.38-ltt/drivers/trace/Config.help --- linux-2.5.38/drivers/trace/Config.help Wed Dec 31 19:00:00 1969 +++ linux-2.5.38-ltt/drivers/trace/Config.help Sun Sep 22 00:51:51 2002 @@ -0,0 +1,48 @@ +Kernel events tracing support +CONFIG_TRACE + It is possible for the kernel to log important events to a tracing + driver. Doing so, enables the use of the generated traces in order + to reconstruct the dynamic behavior of the kernel, and hence the + whole system. + + The tracing process contains 4 parts : + 1) The logging of events by key parts of the kernel. + 2) The trace driver that keeps the events in a data buffer. + 3) A trace daemon that opens the trace driver and is notified + every time there is a certain quantity of data to read + from the trace driver (using SIG_IO). + 4) A trace event data decoder that reads the accumulated data + and formats it in a human-readable format. + + If you say Y or M here, the first part of the tracing process will + always take place. That is, critical parts of the kernel will call + upon the kernel tracing function. The data generated doesn't go + any further until a trace driver registers himself as such with the + kernel. Therefore, if you answer Y, then the driver will be part of + the kernel and the events will always proceed onto the driver and + if you say M, then the events will only proceed onto the driver when + it's module is loaded. Note that event's aren't logged in the driver + until the profiling daemon opens the device, configures it and + issues the "start" command through ioctl(). + + The impact of a fully functionnal system (kernel event logging + + driver event copying + active trace daemon) is of 2.5% for core events. + This means that for a task that took 100 seconds on a normal system, it + will take 102.5 seconds on a traced system. This is very low compared + to other profiling or tracing methods. + + For more information on kernel tracing, the trace daemon or the event + decoder, please check the following address : + http://www.opersys.com/LTT + +CONFIG_LOCKLESS_TRACE + There are normally two tracing schemes available and selectable at + run-time via the trace daemon - locking and lockless. In some cases + e.g. embedded real-time systems, it may be desirable to exclude the + lockless code from the driver in the interest of making it smaller. + Even in such a case, the advantages provided by the lockless code + outweigh the slight increase in size (about 4KB). Unless you're + really out of space, keep this to Y. Setting this to N is probably + a sign that you probably have size problems elsewhere ... + + If unsure, say Y. diff -urpN linux-2.5.38/drivers/trace/Config.in linux-2.5.38-ltt/drivers/trace/Config.in --- linux-2.5.38/drivers/trace/Config.in Wed Dec 31 19:00:00 1969 +++ linux-2.5.38-ltt/drivers/trace/Config.in Sun Sep 22 00:51:51 2002 @@ -0,0 +1,8 @@ +mainmenu_option next_comment +comment 'Kernel tracing' +tristate 'Kernel events tracing support' CONFIG_TRACE +if [ "$CONFIG_TRACE" != "n" ]; then + dep_mbool ' Lock-free tracing support' CONFIG_LOCKLESS_TRACE $CONFIG_TRACE +fi + +endmenu diff -urpN linux-2.5.38/drivers/trace/Makefile linux-2.5.38-ltt/drivers/trace/Makefile --- linux-2.5.38/drivers/trace/Makefile Wed Dec 31 19:00:00 1969 +++ linux-2.5.38-ltt/drivers/trace/Makefile Sun Sep 22 00:51:51 2002 @@ -0,0 +1,17 @@ +# +# Makefile for the kernel tracing drivers. +# +# Note! Dependencies are done automagically by 'make dep', which also +# removes any old dependencies. DON'T put your own dependencies here +# unless it's something special (ie not a .c file). +# +# Note 2! The CFLAGS definitions are now inherited from the +# parent makes.. +# + +O_TARGET := built-in.o + +# Is it loaded as a module or as part of the kernel +obj-$(CONFIG_TRACE) = tracer.o + +include $(TOPDIR)/Rules.make diff -urpN linux-2.5.38/drivers/trace/tracer.c linux-2.5.38-ltt/drivers/trace/tracer.c --- linux-2.5.38/drivers/trace/tracer.c Wed Dec 31 19:00:00 1969 +++ linux-2.5.38-ltt/drivers/trace/tracer.c Sun Sep 22 00:51:51 2002 @@ -0,0 +1,2398 @@ +/* + * linux/drivers/trace/tracer.c + * + * (C) Copyright, 1999, 2000, 2001, 2002 - Karim Yaghmour (karim@opersys.com) + * + * Contains the code for the kernel tracing driver (tracer for short). + * + * Author: + * Karim Yaghmour (karim@opersys.com) + * + * Changelog: + * 16/02/02, Added Tom Zanussi's implementation of K42's lockless logging. + * K42 tracing guru Robert Wisniewski participated in the + * discussions surrounding this implementation. A big thanks to + * the IBM folks. + * 03/12/01, Added user event support. + * 05/01/01, Modified PPC bit manipulation functions for x86 compatibility. + * (andy_lowe@mvista.com) + * 15/11/00, Finally fixed memory allocation and remapping method. Now using + * BTTV-driver-inspired code. + * 13/03/00, Modified tracer so that the daemon mmaps the tracer's buffers + * in it's address space rather than use "read". + * 26/01/00, Added support for standardized buffer sizes and extensibility + * of events. + * 01/10/99, Modified tracer in order to used double-buffering. + * 28/09/99, Adding tracer configuration support. + * 09/09/99, Chaging the format of an event record in order to reduce the + * size of the traces. + * 04/03/99, Initial typing. + * + * Note: + * The sizes of the variables used to store the details of an event are + * planned for a system who gets at least one clock tick every 10 + * milli-seconds. There has to be at least one event every 2^32-1 + * microseconds, otherwise the size of the variable holding the time doesn't + * work anymore. + */ + +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include + +#include "tracer.h" + +/* Module information */ +MODULE_AUTHOR("Karim Yaghmour (karim@opersys.com)"); +MODULE_DESCRIPTION("Linux Trace Toolkit (LTT) kernel tracing driver"); +MODULE_LICENSE("GPL"); + +/* Driver */ +static int sMajorNumber; /* Major number of the tracer */ +static int sOpenCount; /* Number of times device is open */ +/* Locking */ +static int sTracLock; /* Tracer lock used to lock primary buffer */ +static spinlock_t sSpinLock; /* Spinlock in order to lock kernel */ +/* Daemon */ +static int sSignalSent; /* A signal has been sent to the daemon */ +static struct task_struct* sDaemonTaskStruct; /* Task structure of the tracer daemon */ +/* Tracer configuration */ +static int sTracerStarted; /* Is the tracer started */ +static trace_event_mask sTracedEvents; /* Bit-field of events being traced */ +static trace_event_mask sLogEventDetailsMask; /* Log the details of the events mask */ +static int sLogCPUID; /* Log the CPUID associated with each event */ +static int sUseSyscallEIPBounds; /* Use adress bounds to fetch the EIP where call is made */ +static int sLowerEIPBoundSet; /* The lower bound EIP has been set */ +static int sUpperEIPBoundSet; /* The upper bound EIP has been set */ +static void* sLowerEIPBound; /* The lower bound EIP */ +static void* sUpperEIPBound; /* The upper bound EIP */ +static int sTracingPID; /* Tracing only the events for one pid */ +static int sTracingPGRP; /* Tracing only the events for one process group */ +static int sTracingGID; /* Tracing only the events for one gid */ +static int sTracingUID; /* Tracing only the events for one uid */ +static pid_t sTracedPID; /* PID being traced */ +static pid_t sTracedPGRP; /* Process group being traced */ +static gid_t sTracedGID; /* GID being traced */ +static uid_t sTracedUID; /* UID being traced */ +static int sSyscallEIPDepthSet; /* The call depth at which to fetch EIP has been set */ +static int sSyscallEIPDepth; /* The call depth at which to fetch the EIP */ +/* Event data buffers */ +static int sBufReadComplete; /* Number of buffers completely filled */ +static int sSizeReadIncomplete; /* Quantity of data read from incomplete buffers */ +static int sEventsLost; /* Number of events lost because of lack of buffer space */ +static u32 sBufSize; /* Buffer sizes */ +static u32 sAllocSize; /* Size of buffers allocated */ +static u32 sBufferID; /* Unique buffer ID */ +static char* sTracBuf = NULL; /* Trace buffer */ +static char* sWritBuf = NULL; /* Buffer used for writting */ +static char* sReadBuf = NULL; /* Buffer used for reading */ +static char* sWritBufEnd; /* End of write buffer */ +static char* sReadBufEnd; /* End of read buffer */ +static char* sWritPos; /* Current position for writting */ +static char* sReadLimit; /* Limit at which read should stop */ +static char* sWritLimit; /* Limit at which write should stop */ +static int sUseLocking; /* Holds command from daemon */ +static u32 sBufnoBits; /* Holds command from daemon */ +static u32 sBufOffsetBits; /* Holds command from daemon */ +static int sBuffersFull; /* All-buffers-full boolean */ + +/* Time */ +static struct timeval sBufferStartTime; /* The time at which the buffer was started */ + +/* Large data components allocated at load time */ +static char *sUserEventData = NULL; /* The data associated with a user event */ + +/* The global per-buffer control data structure, shared between the tracing + driver and the trace daemon via ioctl. */ +static struct buffer_control sBufferControl; + +/* The size of the structures used to describe the events */ +static int sEventStructSize[TRACE_EV_MAX + 1] = +{ + sizeof(trace_start) /* TRACE_START */ , + sizeof(trace_syscall_entry) /* TRACE_SYSCALL_ENTRY */ , + 0 /* TRACE_SYSCALL_EXIT */ , + sizeof(trace_trap_entry) /* TRACE_TRAP_ENTRY */ , + 0 /* TRACE_TRAP_EXIT */ , + sizeof(trace_irq_entry) /* TRACE_IRQ_ENTRY */ , + 0 /* TRACE_IRQ_EXIT */ , + sizeof(trace_schedchange) /* TRACE_SCHEDCHANGE */ , + 0 /* TRACE_KERNEL_TIMER */ , + sizeof(trace_soft_irq) /* TRACE_SOFT_IRQ */ , + sizeof(trace_process) /* TRACE_PROCESS */ , + sizeof(trace_file_system) /* TRACE_FILE_SYSTEM */ , + sizeof(trace_timer) /* TRACE_TIMER */ , + sizeof(trace_memory) /* TRACE_MEMORY */ , + sizeof(trace_socket) /* TRACE_SOCKET */ , + sizeof(trace_ipc) /* TRACE_IPC */ , + sizeof(trace_network) /* TRACE_NETWORK */ , + sizeof(trace_buffer_start) /* TRACE_BUFFER_START */ , + 0 /* TRACE_BUFFER_END */ , + sizeof(trace_new_event) /* TRACE_NEW_EVENT */ , + sizeof(trace_custom) /* TRACE_CUSTOM */ , + sizeof(trace_change_mask) /* TRACE_CHANGE_MASK */ +}; + +/* The file operations available for the tracer */ +static struct file_operations sTracerFileOps = +{ + owner: THIS_MODULE, + ioctl: tracer_ioctl, + mmap: tracer_mmap, + open: tracer_open, + release: tracer_release, + fsync: tracer_fsync, +}; + +#if CONFIG_LOCKLESS_TRACE +static u32 sLastEventIndex; /* For full-buffers state */ +static struct timeval sLastEventTimeStamp; /* For full-buffers state */ +/* Space reserved for TRACE_EV_BUFFER_START */ +static u32 sStartReserve = TRACER_FIRST_EVENT_SIZE; + +/* Space reserved for TRACE_EV_BUFFER_END event + sizeof lost word, which + though the sizeof lost word isn't necessarily contiguous with rest of + event (it's always at the end of the buffer) is included here for code + clarity. */ +static u32 sEndReserve = TRACER_LAST_EVENT_SIZE; +#endif /* CONFIG_LOCKLESS_TRACE */ + +/* This inspired by rtai/shmem */ +#define FIX_SIZE(x) (((x) - 1) & PAGE_MASK) + PAGE_SIZE + +/* \begin{Code inspired from BTTV driver} */ + +/* Here we want the physical address of the memory. + * This is used when initializing the contents of the + * area and marking the pages as reserved. + */ +static inline unsigned long kvirt_to_pa(unsigned long adr) +{ + unsigned long kva, ret; + + kva = (unsigned long) page_address(vmalloc_to_page((void *) adr)); + kva |= adr & (PAGE_SIZE - 1); /* restore the offset */ + ret = __pa(kva); + return ret; +} + +static void *rvmalloc(unsigned long size) +{ + void *mem; + unsigned long adr; + + mem = vmalloc_32(size); + if (!mem) + return NULL; + + memset(mem, 0, size); /* Clear the ram out, no junk to the user */ + adr = (unsigned long) mem; + while (size > 0) { + mem_map_reserve(vmalloc_to_page((void *) adr)); + adr += PAGE_SIZE; + size -= PAGE_SIZE; + } + + return mem; +} + +static void rvfree(void *mem, unsigned long size) +{ + unsigned long adr; + + if (!mem) + return; + + adr = (unsigned long) mem; + while ((long) size > 0) { + mem_map_unreserve(vmalloc_to_page((void *) adr)); + adr += PAGE_SIZE; + size -= PAGE_SIZE; + } + vfree(mem); +} + +static int tracer_mmap_region(struct vm_area_struct *vma, + const char *adr, + const char *start_pos, + unsigned long size) +{ + unsigned long start = (unsigned long) adr; + unsigned long page, pos; + + pos = (unsigned long) start_pos; + while (size > 0) { + page = kvirt_to_pa(pos); + if (remap_page_range(vma, start, page, PAGE_SIZE, PAGE_SHARED)) + return -EAGAIN; + start += PAGE_SIZE; + pos += PAGE_SIZE; + size -= PAGE_SIZE; + } + return 0; +} +/* \end{Code inspired from BTTV driver} */ + +/** + * tracer_write_to_buffer: - Write data to destination buffer + * + * Writes data to the destination buffer and updates the begining the + * buffer write position. + */ +#define tracer_write_to_buffer(DEST, SRC, SIZE) \ +do\ +{\ + memcpy(DEST, SRC, SIZE);\ + DEST += SIZE;\ +} while(0); + +#if CONFIG_LOCKLESS_TRACE +/*** Lockless scheme functions ***/ + +/** + * init_buffer_control: - Init buffer control struct for new tracing run. + * @pmBC: buffer control struct to be initialized + * @pmUseLockless: which tracing scheme to use, TRUE for lockless + * @pmBufnoBits: number of bits in index word to use for buffer number + * @pmOffsetBits: number of bits in index word to use for buffer offset + * + * Sanity of param values should be checked by caller. i.e. bufno_bits and + * offset_bits must reflect sane buffer sizes/numbers. + */ +static void init_buffer_control(struct buffer_control * pmBC, + int pmUseLockless, + u8 pmBufnoBits, + u8 pmOffsetBits) +{ + unsigned i; + + if((pmBC->using_lockless = pmUseLockless) == TRUE) { + pmBC->index = sStartReserve; + pmBC->bufno_bits = pmBufnoBits; + pmBC->n_buffers = TRACE_MAX_BUFFER_NUMBER(pmBufnoBits); + pmBC->offset_bits = pmOffsetBits; + pmBC->offset_mask = TRACE_BUFFER_OFFSET_MASK(pmOffsetBits); + pmBC->index_mask = (1UL << (pmBufnoBits + pmOffsetBits)) - 1; + + pmBC->buffers_produced = pmBC->buffers_consumed = 0; + + /* When a new buffer is switched to, TRACE_BUFFER_SIZE is + subtracted from its fill_count in order to initialize it + to the empty state. The reason it's done this way is + because an intervening event may have already been written + to the buffer while we were in the process of switching and + thus blindly initializing to 0 would erase that event. + The first buffer is initialized to 0 and the others are + initialized to TRACE_BUFFER_SIZE because the very first + buffer we ever see won't be initialized in that way by + the switching code and since there's never been an event, + we know it should be 0 and that it must be explicitly + initialized that way before logging begins. sStartReserve + is is factored into the end-of-buffer processing, so isn't + added to the fill counts here, except for the first. */ + atomic_set(&pmBC->fill_count[0], (int)sStartReserve); + for(i = 1; i < TRACER_MAX_BUFFERS; i++) + atomic_set(&pmBC->fill_count[i], (int)TRACE_BUFFER_SIZE(pmOffsetBits)); + + /* All buffers are empty at this point */ + sBuffersFull = FALSE; + } +} + +/* These inline atomic functions wrap the linux versions in order to + implement the interface we want as well as to ensure memory barriers. */ + +/** + * compare_and_store_volatile: - Self-explicit + * @ptr: ptr to the word that will receive the new value + * @oval: the value we think is currently in *ptr + * @nval: the value *ptr will get if we were right + * + * If *ptr is still what we think it is, atomically assign nval to it and + * return a boolean indicating TRUE if the new value was stored, FALSE + * otherwise. + * + * Pseudocode for this operation: + * + * if(*ptr == oval) { + * *ptr = nval; + * return TRUE; + * } else { + * return FALSE; + * } + */ +inline int compare_and_store_volatile(volatile u32 *ptr, + u32 oval, + u32 nval) +{ + u32 prev; + + barrier(); + prev = cmpxchg(ptr, oval, nval); + barrier(); + + return (prev == oval); +} + +/** + * atomic_set_volatile: - Atomically set the value in ptr to nval. + * @ptr: ptr to the word that will receive the new value + * @nval: the new value + * + * Uses memory barriers to set *ptr to nval. + */ +inline void atomic_set_volatile(atomic_t *ptr, + u32 nval) +{ + barrier(); + atomic_set(ptr, (int)nval); + barrier(); +} + +/** + * atomic_add_volatile: - Atomically add val to the value at ptr. + * @ptr: ptr to the word that will receive the addition + * @val: the value to add to *ptr + * + * Uses memory barriers to add val to *ptr. + */ +inline void atomic_add_volatile(atomic_t *ptr, u32 val) +{ + barrier(); + atomic_add((int)val, ptr); + barrier(); +} + +/** + * atomic_sub_volatile: - Atomically substract val from the value at ptr. + * @ptr: ptr to the word that will receive the subtraction + * @val: the value to subtract from *ptr + * + * Uses memory barriers to substract val from *ptr. + */ +inline void atomic_sub_volatile(atomic_t *ptr, s32 val) +{ + barrier(); + atomic_sub((int)val, ptr); + barrier(); +} + +/** + * trace_commit: - Atomically commit a reserved slot in the buffer. + * @index: index into the trace buffer + * @len: the value to add to fill_count of the buffer contained in index + * + * Atomically add len to the fill_count of the buffer specified by the + * buffer number contained in index. + */ +static inline void trace_commit(u32 index, u32 len) +{ + u32 bufno = TRACE_BUFFER_NUMBER_GET(index, sBufferControl.offset_bits); + atomic_add_volatile(&sBufferControl.fill_count[bufno], len); +} + +/** + * write_start_buffer_event: - Write start-buffer event to buffer start. + * @pmIndex: index into the trace buffer + * @pmTime: the time of the start-buffer event + * + * Writes start-buffer event at the start of the buffer specified by the + * buffer number contained in pmIndex. + */ +static inline void write_start_buffer_event(u32 pmIndex, struct timeval pmTime) +{ + trace_buffer_start lStartBufferEvent; /* Start of new buffer event */ + u8 lEventID; /* Event ID of last event */ + uint16_t lDataSize; /* Size of tracing data */ + trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */ + char* lWritPos; /* Current position for writing */ + + /* Clear the offset bits of index to get the beginning of buffer */ + lWritPos = sTracBuf + TRACE_BUFFER_OFFSET_CLEAR(pmIndex, + sBufferControl.offset_mask); + + /* Increment buffer ID */ + sBufferID++; + + /* Write the start of buffer event */ + lStartBufferEvent.ID = sBufferID; + lStartBufferEvent.Time = pmTime; + + /* Write event type to tracing buffer */ + lEventID = TRACE_EV_BUFFER_START; + tracer_write_to_buffer(lWritPos, + &lEventID, + sizeof(lEventID)); + + /* Write event time delta to tracing buffer */ + lTimeDelta = 0; + tracer_write_to_buffer(lWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); + + /* Write event structure */ + tracer_write_to_buffer(lWritPos, + &lStartBufferEvent, + sizeof(lStartBufferEvent)); + + /* Compute the data size */ + lDataSize = sizeof(lEventID) + + sizeof(lTimeDelta) + + sizeof(lStartBufferEvent) + + sizeof(lDataSize); + + /* Write the length of the event description */ + tracer_write_to_buffer(lWritPos, + &lDataSize, + sizeof(lDataSize)); +} + +/** + * write_end_buffer_event: - Write end-buffer event to end of buffer. + * @pmIndex: index into the trace buffer + * @pmTime: the time of the end-buffer event + * + * Writes end-buffer event at the end of the buffer specified by the + * buffer number contained in pmIndex, at the offset also contained in + * pmIndex. + */ +static inline void write_end_buffer_event(u32 pmIndex, struct timeval pmTime) +{ + u8 lEventID; /* Event ID of last event */ + u8 lCPUID; /* CPUID of currently runing process */ + trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */ + char* lWritPos; /* Current position for writing */ + + lWritPos = sTracBuf + pmIndex; + + /* Write the CPUID to the tracing buffer, if required */ + if (sLogCPUID == TRUE) { + lCPUID = smp_processor_id(); + tracer_write_to_buffer(lWritPos, + &lCPUID, + sizeof(lCPUID)); + } + /* Write event type to tracing buffer */ + lEventID = TRACE_EV_BUFFER_END; + tracer_write_to_buffer(lWritPos, + &lEventID, + sizeof(lEventID)); + + /* Write event time delta to tracing buffer */ + lTimeDelta = 0; + tracer_write_to_buffer(lWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); +} + +/** + * write_lost_size: - Write lost size to end of buffer contained in index. + * @pmIndex: index into the trace buffer + * @pmSizeLost: number of bytes lost at the end of buffer + * + * Writes the value contained in pmSizeLost as the last word in the + * the buffer specified by the buffer number contained in pmIndex. The + * 'lost size' is the number of bytes that are left unused by the tracing + * scheme at the end of a buffer for a variety of reasons. + */ +static inline void write_lost_size(u32 pmIndex, u32 pmSizeLost) +{ + char* lWritBufEnd; /* End of buffer */ + + /* Get end of buffer by clearing offset and adding buffer size */ + lWritBufEnd = sTracBuf + + TRACE_BUFFER_OFFSET_CLEAR(pmIndex, sBufferControl.offset_mask) + + TRACE_BUFFER_SIZE(sBufferControl.offset_bits); + + /* Write size lost at the end of the buffer */ + *((u32 *) (lWritBufEnd - sizeof(pmSizeLost))) = pmSizeLost; +} + +/** + * finalize_buffer: - Utility function consolidating end-of-buffer tasks. + * @pmEndIndex: index into trace buffer to write the end-buffer event at + * @pmSizeLost: number of unused bytes at the end of the buffer + * @pmTimestamp: the time of the end-buffer event + * + * This function must be called from within a lock, because it increments + * buffers_produced. + */ +static inline void finalize_buffer(u32 pmEndIndex, u32 pmSizeLost, struct timeval *pmTimestamp) +{ + /* Write end buffer event as last event in old buffer. */ + write_end_buffer_event(pmEndIndex, *pmTimestamp); + + /* In any buffer switch, we need to write out the lost size, + which can be 0. */ + write_lost_size(pmEndIndex, pmSizeLost); + + /* Add the size lost and end event size to fill_count so that + the old buffer won't be seen as incomplete. */ + trace_commit(pmEndIndex, pmSizeLost); + + /* Every finalized buffer means a produced buffer */ + sBufferControl.buffers_produced++; +} + +/** + * finalize_lockless_trace: - finalize last buffer at end of trace + * + * Called when tracing is stopped, to finish processing last buffer. + */ +static inline void finalize_lockless_trace(void) +{ + u32 lEventsEnd; /* Index of end of last event */ + u32 lSizeLost; /* Bytes after end of last event */ + unsigned long int lFlags; /* CPU flags for lock */ + + /* Find index of end of last event */ + lEventsEnd = TRACE_BUFFER_OFFSET_GET(sBufferControl.index, sBufferControl.offset_mask); + + /* Size lost in buffer is the unused space after end of last event + and end of buffer. */ + lSizeLost = TRACE_BUFFER_SIZE(sBufferControl.offset_bits) - lEventsEnd; + + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* Write end event etc. and increment buffers_produced. The + time used here is what the locking version uses as well. */ + finalize_buffer(sBufferControl.index & sBufferControl.index_mask, lSizeLost, &sBufferStartTime); + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); +} + +/** + * discard_check: - Determine whether an event should be discarded. + * @pmOldIndex: index into trace buffer where check for space should begin + * @pmLen: the length of the event to check + * @pmTimestamp: the time of the end-buffer event + * + * Checks whether an event of size pmLen will fit into the available + * buffer space as indicated by the value in pmOldIndex. A side effect + * of this function is that if the length would fill or overflow the + * last available buffer, that buffer will be finalized and all + * subsequent events will be automatically discarded until a buffer is + * later freed. + * + * The return value contains the result flags and is an ORed combination + * of the following: + * + * LTT_EVENT_DISCARD_NONE - event should not be discarded + * LTT_BUFFER_SWITCH - buffer switch occurred + * LTT_EVENT_DISCARD - event should be discarded (all buffers are full) + * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer + */ +static inline int discard_check(u32 pmOldIndex, + u32 pmLen, + struct timeval *pmTimestamp) +{ + u32 lBuffersReady; + u32 lOffsetMask = sBufferControl.offset_mask; + u8 lOffsetBits = sBufferControl.offset_bits; + u32 lIndexMask = sBufferControl.index_mask; + u32 lSizeLost; + unsigned long int lFlags; /* CPU flags for lock */ + + /* Check whether the event is larger than a buffer */ + if(pmLen >= TRACE_BUFFER_SIZE(sBufferControl.offset_bits)) + return LTT_EVENT_DISCARD | LTT_EVENT_TOO_LONG; + + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* We're already overrun, nothing left to do */ + if(sBuffersFull == TRUE) { + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + return LTT_EVENT_DISCARD; + } + + lBuffersReady = sBufferControl.buffers_produced - sBufferControl.buffers_consumed; + + /* If this happens, we've been pushed to the edge of the last + available buffer which means we need to finalize it and increment + buffers_produced. However, we don't want to allow + sBufferControl.index to be actually pushed to full or beyond, + otherwise we'd just be wrapping around and allowing subsequent + events to overwrite good buffers. It is true that there may not + be enough space for this event, but there could be space for + subsequent smaller event(s). It doesn't matter if they write + themselves, because here we say that anything after the old_index + passed in to this function is lost, even if other events have or + will reserve space in this last buffer. Nor can any other event + reserve space in buffers following this one, until at least one + buffer is consumed by the daemon. */ + if(lBuffersReady == sBufferControl.n_buffers - 1) { + /* We set this flag so we only do this once per overrun */ + sBuffersFull = TRUE; + + /* Get the time of the event */ + do_gettimeofday(pmTimestamp); + + /* Size lost is everything after old_index */ + lSizeLost = TRACE_BUFFER_SIZE(lOffsetBits) + - TRACE_BUFFER_OFFSET_GET(pmOldIndex, lOffsetMask); + + /* Write end event and lost size. This increases buffer_count + by the lost size, which is important later when we add the + deferred size. */ + finalize_buffer(pmOldIndex & lIndexMask, lSizeLost, pmTimestamp); + + /* We need to add the lost size to old index, but we can't + do it now, or we'd roll index over and allow new events, + so we defer it until a buffer is free. Note however that + buffer_count does get incremented by lost size, which is + important later when start logging again. */ + sLastEventIndex = pmOldIndex; + sLastEventTimeStamp = *pmTimestamp; + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + /* We lose this event */ + return LTT_BUFFER_SWITCH | LTT_EVENT_DISCARD; + } + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + /* Nothing untoward happened */ + return LTT_EVENT_DISCARD_NONE; +} + +/** + * trace_reserve_slow: - The slow reserve path in the lockless scheme. + * @pmOldIndex: the value of the buffer control index when we were called + * @pmLen: the length of the slot to reserve + * @pmIndex: variable that will receive the start pos of the reserved slot + * @pmTimestamp: variable that will receive the time the slot was reserved + * + * Called by trace_reserve() if the length of the event being logged would + * most likely cause a 'buffer switch'. The value of the variable pointed + * to by pmIndex will contain the index actually reserved by this + * function. The timestamp reflecting the time the slot was reserved + * will be saved in *pmTimestamp. The return value indicates whether + * there actually was a buffer switch (not inevitable in all cases). + * If the return value also indicates a discarded event, the values in + * *pmIndex and *pmTimestamp will be indeterminate. + * + * The return value contains the result flags and is an ORed combination + * of the following: + * + * LTT_BUFFER_SWITCH_NONE - no buffer switch occurred + * LTT_EVENT_DISCARD_NONE - event should not be discarded + * LTT_BUFFER_SWITCH - buffer switch occurred + * LTT_EVENT_DISCARD - event should be discarded (all buffers are full) + * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer + */ +static inline int trace_reserve_slow(u32 pmOldIndex, /* needed for overruns */ + u32 pmLen, + u32 *pmIndex, + struct timeval *pmTimestamp) +{ + u32 lNewIndex, lOffset, lNewBufno; + unsigned long int lFlags; /* CPU flags for lock */ + u32 lOffsetMask = sBufferControl.offset_mask; + u8 lOffsetBits = sBufferControl.offset_bits; + u32 lIndexMask = sBufferControl.index_mask; + u32 lSizeLost = sEndReserve; /* size lost always includes end event */ + int lDiscardEvent; + int lBufferSwitched = LTT_BUFFER_SWITCH_NONE; + + /* We don't get here unless the event might cause a buffer switch */ + + /* First check whether conditions exist do discard the event */ + lDiscardEvent = discard_check(pmOldIndex, pmLen, pmTimestamp); + if(lDiscardEvent != LTT_EVENT_DISCARD_NONE) + return lDiscardEvent; + + /* If we're here, we still have free buffers to reserve from */ + + /* Do this until we reserve a spot for the event */ + do { + /* Yeah, we're re-using a param variable, is that bad form? */ + pmOldIndex = sBufferControl.index; + + /* We're here because the event + ending reserve space would + overflow or exactly fill old buffer. Calculate new index + again. */ + lNewIndex = pmOldIndex + pmLen; + + /* We only care about the offset part of the new index */ + lOffset = TRACE_BUFFER_OFFSET_GET(lNewIndex + sEndReserve, lOffsetMask); + + /* If we would actually overflow and not exactly fill the old + buffer, we reserve the first slot (after adding a buffer + start event) in the new one. */ + if((lOffset < pmLen) && (lOffset > 0)) { + + /* This is an overflow, not an exact fit. The + reserved index is just after the space reserved for + the start event in the new buffer. */ + *pmIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex + sEndReserve, lOffsetMask) + + sStartReserve; + + /* Now the next free space is at the reserved index + plus the length of this event. */ + lNewIndex = *pmIndex + pmLen; + } else if (lOffset < pmLen) { + /* We'll exactly fill the old buffer, so our reserved + index is still in the old buffer and our new index + is in the new one + sStartReserve */ + *pmIndex = pmOldIndex; + lNewIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex + sEndReserve, lOffsetMask) + + sStartReserve; + } else + /* another event has actually pushed us into a new + buffer since we were called. */ + *pmIndex = pmOldIndex; + + /* Get the time of the event */ + do_gettimeofday(pmTimestamp); + } while (!compare_and_store_volatile(&sBufferControl.index, + pmOldIndex, lNewIndex)); + + /* Once we're successful in saving a new_index as the authoritative + new global buffer control index, finish the buffer switch + processing. */ + + /* Mask off the high bits outside of our reserved index */ + *pmIndex &= lIndexMask; + + /* At this point, our indices are set in stone, so we can safely + write our start and end events and lost count to our buffers. + The first test here could fail if between the time reserve_slow + was called and we got a reserved slot, we slept and someone else + did the buffer switch already. */ + if(lOffset < pmLen) { /* Event caused a buffer switch. */ + if(lOffset > 0) /* We didn't exactly fill the old buffer */ + /* Set the size lost value in the old buffer. That + value is len+sEndReserve-offset-sEndReserve, + i.e. sEndReserve cancels itself out. */ + lSizeLost += pmLen - lOffset; + else /* We exactly filled the old buffer */ + /* Since we exactly filled the old buffer, the index + we write the end event to is after the space + reserved for this event. */ + pmOldIndex += pmLen; + + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* Write end event etc. and increment buffers_produced. */ + finalize_buffer(pmOldIndex & lIndexMask, lSizeLost, pmTimestamp); + + /* If we're here, we had a normal buffer switch and need to + update the start buffer time before writing the event. + The start buffer time is the same as the event time for the + event reserved, and lTimeDelta of 0 but that also appears + to be the case in the locking version as well. */ + sBufferStartTime = *pmTimestamp; + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + /* new_index is always valid here, since it's set correctly + if offset < len + sEndReserve, and we don't get here + unless that's true. The issue would be that if we didn't + actually switch buffers, new_index would be too large by + sEndReserve bytes. */ + write_start_buffer_event(lNewIndex & lIndexMask, *pmTimestamp); + + /* We initialize the new buffer by subtracting + TRACE_BUFFER_SIZE rather than directly initializing to + sStartReserve in case events have been already been added + to the new buffer under us. We subtract space for the start + buffer event from buffer size to leave room for the start + buffer event we just wrote. */ + lNewBufno = TRACE_BUFFER_NUMBER_GET(lNewIndex & lIndexMask, lOffsetBits); + atomic_sub_volatile(&sBufferControl.fill_count[lNewBufno], + TRACE_BUFFER_SIZE(lOffsetBits) - sStartReserve); + + /* We need to check whether fill_count is less than the + sStartReserve. If this test is true, it means that + subtracting the buffer size underflowed fill_count i.e. + fill_count represents an incomplete buffer. Any any case, + we're completely fubared and don't have any choice but to + start the new buffer out fresh. */ + if(atomic_read(&sBufferControl.fill_count[lNewBufno]) < sStartReserve) + atomic_set_volatile(&sBufferControl.fill_count[lNewBufno], sStartReserve); + + /* If we're here, there must have been a buffer switch */ + lBufferSwitched = LTT_BUFFER_SWITCH; + } + + return lBufferSwitched; +} + +/** + * trace_reserve: - Reserve a slot in the trace buffer for an event. + * @pmLen: the length of the slot to reserve + * @pmIndex: variable that will receive the start pos of the reserved slot + * @pmTimestamp: variable that will receive the time the slot was reserved + * + * This is the fast path for reserving space in the trace buffer in the + * lockless tracing scheme. If a slot was successfully reserved, the + * caller can then at its leisure write data to the reserved space (at + * least until the space is reclaimed in an out-of-space situation). + * + * If the requested length would fill or exceed the current buffer, the + * slow path, trace_reserve_slow(), will be executed instead. + * + * The index reflecting the start position of the slot reserved will be + * saved in *pmIndex, and the timestamp reflecting the time the slot was + * reserved will be saved in *pmTimestamp. If the return value indicates + * a discarded event, the values in *pmIndex and *pmTimestamp will be + * indeterminate. + * + * The return value contains the result flags and is an ORed combination + * of the following: + * + * LTT_BUFFER_SWITCH_NONE - no buffer switch occurred + * LTT_EVENT_DISCARD_NONE - event should not be discarded + * LTT_BUFFER_SWITCH - buffer switch occurred + * LTT_EVENT_DISCARD - event should be discarded (all buffers are full) + * LTT_EVENT_TOO_LONG - event won't fit into even an empty buffer + */ +static inline int trace_reserve(u32 pmLen, + u32 *pmIndex, + struct timeval *pmTimestamp) +{ + u32 lOldIndex, lNewIndex, lOffset; + u32 lOffsetMask = sBufferControl.offset_mask; + + /* Do this until we reserve a spot for the event */ + do { + lOldIndex = sBufferControl.index; + + /* If adding len + sEndReserve to the old index doesn't put us + into a new buffer, this is what the new index would be. */ + lNewIndex = lOldIndex + pmLen; + lOffset = TRACE_BUFFER_OFFSET_GET(lNewIndex + sEndReserve, lOffsetMask); + + /* If adding the length reserved for the end buffer event and + lost count to the new index would put us into a new buffer, + we need to do a buffer switch. If in between now and the + buffer switch another event that does fit comes in, no + problem because we check again in the slow version. In + either case, there will always be room for the end event + in the old buffer. The trick in this test is that adding + a length that would carry into the non-offset bits of the + index results in the offset portion being smaller than the + length that was added. */ + if(lOffset < pmLen) + /* We would roll over into a new buffer, need to do + buffer switch processing. */ + return trace_reserve_slow(lOldIndex, pmLen, pmIndex, pmTimestamp); + + /* Get the time of the event */ + do_gettimeofday(pmTimestamp); + } while (!compare_and_store_volatile(&sBufferControl.index, + lOldIndex, lNewIndex)); + + /* Once we're successful in saving a new_index as the authoritative + new global buffer control index, we can return old_index, the + successfully reserved index. */ + + /* Return the reserved index value */ + *pmIndex = lOldIndex & sBufferControl.index_mask; + + return LTT_BUFFER_SWITCH_NONE; /* No buffer switch occurred */ +} + +/** + * lockless_write_event: - Locklessly reserves space and writes an event. + * @pmEventID: event id + * @pmEventStruct: event details + * @pmDataSize: total event size + * @pmCPUID: CPU ID associated with event + * @pmVarDataBeg: ptr to variable-length data for the event + * @pmVarDataLen: length of variable-length data for the event + * + * This is the main event-writing function for the lockless scheme. It + * reserves space for an event if possible, writes the event and signals + * the daemon if it caused a buffer switch. + */ +int lockless_write_event(u8 pmEventID, + void *pmEventStruct, + uint16_t pmDataSize, + u8 pmCPUID, + void *pmVarDataBeg, + int pmVarDataLen) +{ + u32 lReservedIndex; + struct timeval lTime; + trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */ + struct siginfo lSigInfo; /* Signal information */ + int lReserveRC; + char* lWritPos; /* Current position for writing */ + int lRC = 0; + + /* Reserve space for the event. If the space reserved is in a new + buffer, note that fact. */ + lReserveRC = trace_reserve((u32)pmDataSize, + &lReservedIndex, &lTime); + + /* Exact lost event count isn't important to anyone, so this is OK. */ + if(lReserveRC & LTT_EVENT_DISCARD) + sEventsLost++; + + /* We don't write the event, but we still need to signal */ + if((lReserveRC & LTT_BUFFER_SWITCH) && + (lReserveRC & LTT_EVENT_DISCARD)) { + lRC = -ENOMEM; + goto send_buffer_switch_signal; + } + + /* no buffer space left, discard event. */ + if((lReserveRC & LTT_EVENT_DISCARD) || + (lReserveRC & LTT_EVENT_TOO_LONG)) + /* return value for trace() */ + return -ENOMEM; + + /* The position we write to in the trace memory area is simply the + beginning of trace memory plus the index we just reserved. */ + lWritPos = sTracBuf + lReservedIndex; + /* Compute the time delta between this event and the time at which + this buffer was started */ + lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000 + + (lTime.tv_usec - sBufferStartTime.tv_usec); + + /* Write the CPUID to the tracing buffer, if required */ + if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) + tracer_write_to_buffer(lWritPos, + &pmCPUID, + sizeof(pmCPUID)); + + /* Write event type to tracing buffer */ + tracer_write_to_buffer(lWritPos, + &pmEventID, + sizeof(pmEventID)); + + /* Write event time delta to tracing buffer */ + tracer_write_to_buffer(lWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); + + /* Do we log event details */ + if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) { + /* Write event structure */ + tracer_write_to_buffer(lWritPos, + pmEventStruct, + sEventStructSize[pmEventID]); + + /* Write string if any */ + if (pmVarDataLen) + tracer_write_to_buffer(lWritPos, + pmVarDataBeg, + pmVarDataLen); + } + /* Write the length of the event description */ + tracer_write_to_buffer(lWritPos, + &pmDataSize, + sizeof(pmDataSize)); + + /* We've written the event - update the fill_count for the buffer. */ + trace_commit(lReservedIndex, (u32)pmDataSize); + +send_buffer_switch_signal: + + /* Signal the daemon if we switched buffers */ + if(lReserveRC & LTT_BUFFER_SWITCH) { + /* Setup signal information */ + lSigInfo.si_signo = SIGIO; + lSigInfo.si_errno = 0; + lSigInfo.si_code = SI_KERNEL; + +#if 0 + /* DEBUG */ + printk("<1> Sending SIGIO to %d \n", sDaemonTaskStruct->pid); +#endif + /* Signal the tracing daemon */ + send_sig_info(SIGIO, &lSigInfo, sDaemonTaskStruct); + } + + return lRC; +} + +/** + * continue_trace: - Continue a stopped trace. + * + * Continue a trace that's been temporarily stopped because all buffers + * were full. + */ +static inline void continue_trace(void) +{ + int lDiscardSize; + u32 lLastEventBufno; + u32 lLastBufferLostSize; + u32 lLastEventOffset; + u32 lNewIndex; + + /* A buffer's been consumed, and as we've been waiting around at the + end of the last one produced, the one after that must now be free */ + int lFreedBufno = sBufferControl.buffers_produced % sBufferControl.n_buffers; + + /* Start the new buffer out at the beginning */ + atomic_set_volatile(&sBufferControl.fill_count[lFreedBufno], sStartReserve); + + /* In the all-buffers-full case, sBufferControl.index is frozen at the + position of the first event that would have caused a buffer switch. + However, the fill_count for that buffer is not frozen and reflects + not only the lost size calculated at that point, but also any + smaller events that managed to write themselves at the end of the + last buffer (because there's technically still space at the end, + though it and all those contained events will be erased here). + Here we try to salvage if possible that last buffer, but to do + that, we need to subtract those pesky smaller events that managed + to get in. If after all that, another small event manages to + sneak in in the time it takes us to do this, well, we concede and + the daemon will toss that buffer. It's not the end of the world + if that happens, since that buffer actually marked the start of a + bunch of lost events which continues until a buffer is freed. */ + + /* Get the bufno and offset of the buffer containing the last event + logged before we had to stop for a buffer-full condition. */ + lLastEventOffset = TRACE_BUFFER_OFFSET_GET(sLastEventIndex, sBufferControl.offset_mask); + lLastEventBufno = TRACE_BUFFER_NUMBER_GET(sLastEventIndex, sBufferControl.offset_bits); + + /* We also need to know the lost size we wrote to that buffer when we + stopped */ + lLastBufferLostSize = TRACE_BUFFER_SIZE(sBufferControl.offset_bits) - lLastEventOffset; + + /* Since the time we stopped, some smaller events probably reserved + space and wrote themselves in, the sizes of which would have been + reflected in the fill_count. The total size of these events is + calculated here. */ + lDiscardSize = atomic_read(&sBufferControl.fill_count[lLastEventBufno]) + - lLastEventOffset + - lLastBufferLostSize; + + /* If there were events written after we stopped, subtract those from + the fill_count. If that doesn't fix things, the buffer either is + really incomplete, or another event snuck in, and we'll just stop + now and say we did what we could for it. */ + if(lDiscardSize > 0) + atomic_sub_volatile(&sBufferControl.fill_count[lLastEventBufno], lDiscardSize); + + /* Since our end buffer event probably got trounced, rewrite it in old + buffer. */ + write_end_buffer_event(sLastEventIndex & sBufferControl.index_mask, sLastEventTimeStamp); + + /* We also need to update the buffer start time and write the start + event for the next buffer, since we couldn't do it until now */ + do_gettimeofday(&sBufferStartTime); + + /* The current buffer control index is hanging around near the end of + the last buffer. So we add the buffer size and clear the offset to + get to the beginning of the newly freed buffer. */ + lNewIndex = sBufferControl.index + TRACE_BUFFER_SIZE(sBufferControl.offset_bits); + lNewIndex = TRACE_BUFFER_OFFSET_CLEAR(lNewIndex, sBufferControl.offset_mask) + sStartReserve; + write_start_buffer_event(lNewIndex & sBufferControl.index_mask, sBufferStartTime); + + /* Fixing up sBufferControl.index is simpler. Since a buffer has been + consumed, there's now at least one buffer free, and we can continue. + We start off the next buffer in a fresh state. Since nothing else + can be meaningfully updating the buffer control index, we can safely + do that here. 'Meaningfully' means that there may be cases of + smaller events managing to update the index in the last buffer but + they're essentially erased by the lost size of that buffer when + sBuffersFull was set. We need to restart the index at the beginning + of the next available buffer before turning off sBuffersFull, and + avoid an erroneous buffer switch. */ + sBufferControl.index = lNewIndex; + + /* Now we can continue reserving events */ + sBuffersFull = FALSE; +} + +/** + * tracer_set_n_buffers: - Sets the number of buffers. + * @pmNBuffers: number of buffers. + * + * Sets the number of buffers containing the trace data, valid only for + * lockless scheme, must be a power of 2. + * + * Returns: + * + * 0, Size setting went OK + * -EINVAL, not a power of 2 + */ +int tracer_set_n_buffers(int pmNBuffers) +{ + if(hweight32(pmNBuffers) != 1) /* Invalid if # set bits in word != 1 */ + return -EINVAL; + + /* Find position of one and only set bit */ + sBufnoBits = ffs(pmNBuffers) - 1; + + return 0; +} +#else +static void init_buffer_control(struct buffer_control * pmBC, + int pmUseLockless, + u8 pmBufnoBits, + u8 pmOffsetBits) +{ + pmBC->using_lockless = pmUseLockless; +} +static inline void write_start_buffer_event(u32 pmIndex, struct timeval pmTime) +{ +} +static inline void finalize_lockless_trace(void) +{ +} +static inline void continue_trace(void) +{ +} +int tracer_set_n_buffers(int pmNBuffers) +{ + return -EINVAL; +} +#endif /* CONFIG_LOCKLESS_TRACE */ + +/** + * trace: - Tracing function per se. + * @pmEventID: ID of event as defined in linux/trace.h + * @pmEventStruct: struct describing the event + * + * Returns: + * 0, if everything went OK (event got registered) + * -ENODEV, no tracing daemon opened the driver. + * -ENOMEM, no more memory to store events. + * -EBUSY, tracer not started yet. + * + * Note: + * The kernel has to be locked here because trace() could be called from + * an interrupt handling routine and from process service routine. + */ +int trace(u8 pmEventID, + void *pmEventStruct) +{ + int lVarDataLen = 0; /* Length of variable length data to be copied, if any */ + void *lVarDataBeg = NULL; /* Begining of variable length data to be copied */ + int lSendSignal = FALSE; /* Should the daemon be summoned */ + u8 lCPUID; /* CPUID of currently runing process */ + uint16_t lDataSize; /* Size of tracing data */ + struct siginfo lSigInfo; /* Signal information */ + struct timeval lTime; /* Event time */ + unsigned long int lFlags; /* CPU flags for lock */ + trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */ + struct task_struct *pIncomingProcess = NULL; /* Pointer to incoming process */ + + /* Is there a tracing daemon */ + if (sDaemonTaskStruct == NULL) + return -ENODEV; + + /* Is this the exit of a process? */ + if ((pmEventID == TRACE_EV_PROCESS) && + (pmEventStruct != NULL) && + ((((trace_process *) pmEventStruct)->event_sub_id) == TRACE_EV_PROCESS_EXIT)) + trace_destroy_owners_events(current->pid); + + /* Do we trace the event */ + if ((sTracerStarted == TRUE) || (pmEventID == TRACE_EV_START) || (pmEventID == TRACE_EV_BUFFER_START)) + goto TraceEvent; + + return -EBUSY; + +TraceEvent: + /* Are we monitoring this event */ + if (!ltt_test_bit(pmEventID, &sTracedEvents)) + return 0; + + /* Always let the start event pass, whatever the IDs */ + if ((pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) { + /* Is this a scheduling change */ + if (pmEventID == TRACE_EV_SCHEDCHANGE) { + /* Get pointer to incoming process */ + pIncomingProcess = (struct task_struct *) (((trace_schedchange *) pmEventStruct)->in); + + /* Set PID information in schedchange event */ + (((trace_schedchange *) pmEventStruct)->in) = pIncomingProcess->pid; + } + /* Are we monitoring a particular process */ + if ((sTracingPID == TRUE) && (current->pid != sTracedPID)) { + /* Record this event if it is the scheduling change bringing in the traced PID */ + if (pIncomingProcess == NULL) + return 0; + else if (pIncomingProcess->pid != sTracedPID) + return 0; + } + /* Are we monitoring a particular process group */ + if ((sTracingPGRP == TRUE) && (current->pgrp != sTracedPGRP)) { + /* Record this event if it is the scheduling change bringing in a process of the traced PGRP */ + if (pIncomingProcess == NULL) + return 0; + else if (pIncomingProcess->pgrp != sTracedPGRP) + return 0; + } + /* Are we monitoring the processes of a given group of users */ + if ((sTracingGID == TRUE) && (current->egid != sTracedGID)) { + /* Record this event if it is the scheduling change bringing in a process of the traced GID */ + if (pIncomingProcess == NULL) + return 0; + else if (pIncomingProcess->egid != sTracedGID) + return 0; + } + /* Are we monitoring the processes of a given user */ + if ((sTracingUID == TRUE) && (current->euid != sTracedUID)) { + /* Record this event if it is the scheduling change bringing in a process of the traced UID */ + if (pIncomingProcess == NULL) + return 0; + else if (pIncomingProcess->euid != sTracedUID) + return 0; + } + } + + /* Compute size of tracing data */ + lDataSize = sizeof(pmEventID) + sizeof(lTimeDelta) + sizeof(lDataSize); + + /* Do we log the event details */ + if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) { + /* Update the size of the data entry */ + lDataSize += sEventStructSize[pmEventID]; + + /* Some events have variable length */ + switch (pmEventID) { + /* Is there a file name in this */ + case TRACE_EV_FILE_SYSTEM: + if ((((trace_file_system *) pmEventStruct)->event_sub_id == TRACE_EV_FILE_SYSTEM_EXEC) + || (((trace_file_system *) pmEventStruct)->event_sub_id == TRACE_EV_FILE_SYSTEM_OPEN)) { + /* Remember the string's begining and update size variables */ + lVarDataBeg = ((trace_file_system *) pmEventStruct)->file_name; + lVarDataLen = ((trace_file_system *) pmEventStruct)->event_data2 + 1; + lDataSize += (uint16_t) lVarDataLen; + } + break; + + /* Logging of a custom event */ + case TRACE_EV_CUSTOM: + lVarDataBeg = ((trace_custom *) pmEventStruct)->data; + lVarDataLen = ((trace_custom *) pmEventStruct)->data_size; + lDataSize += (uint16_t) lVarDataLen; + break; + } + } + + /* Do we record the CPUID */ + if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) { + /* Remember the CPUID */ + lCPUID = smp_processor_id(); + + /* Update the size of the data entry */ + lDataSize += sizeof(lCPUID); + } + +#if CONFIG_LOCKLESS_TRACE +/* Lock-free event-writing isn't available without cmpxchg */ +#if __HAVE_ARCH_CMPXCHG + /* If we're using the lockless scheme, we preempt the default path + here - nothing after this point in this function will be executed. + Note that even if we do have cmpxchg, we still want to have a + choice between the lock-free and locking schemes at run-time, thus + the using_lockless check. This used to be implemented as a kernel + hook, and will be again when/if kernel hooks are accepted into the + kernel. */ + if(sBufferControl.using_lockless) + return lockless_write_event(pmEventID, + pmEventStruct, + lDataSize, + lCPUID, + lVarDataBeg, + lVarDataLen); +#endif /* __HAVE_ARCH_CMPXCHG */ +#endif /* CONFIG_LOCKLESS_TRACE */ + + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* The following time calculations have to be done within the spinlock because + otherwise the event order could be inverted. */ + + /* Get the time of the event */ + do_gettimeofday(&lTime); + + /* Compute the time delta between this event and the time at which this buffer was started */ + lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000 + + (lTime.tv_usec - sBufferStartTime.tv_usec); + + /* Is there enough space left in the write buffer */ + if (sWritPos + lDataSize > sWritLimit) { + /* Have we already switched buffers and informed the daemon of it */ + if (sSignalSent == TRUE) { + /* We've lost another event */ + sEventsLost++; + + /* Bye, bye, now */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + return -ENOMEM; + } + /* We need to inform the daemon */ + lSendSignal = TRUE; + + /* Switch buffers */ + tracer_switch_buffers(lTime); + + /* Recompute the time delta since sBufferStartTime has changed because of the buffer change */ + lTimeDelta = (lTime.tv_sec - sBufferStartTime.tv_sec) * 1000000 + + (lTime.tv_usec - sBufferStartTime.tv_usec); + } + /* Write the CPUID to the tracing buffer, if required */ + if ((sLogCPUID == TRUE) && (pmEventID != TRACE_EV_START) && (pmEventID != TRACE_EV_BUFFER_START)) + tracer_write_to_buffer(sWritPos, + &lCPUID, + sizeof(lCPUID)); + + /* Write event type to tracing buffer */ + tracer_write_to_buffer(sWritPos, + &pmEventID, + sizeof(pmEventID)); + + /* Write event time delta to tracing buffer */ + tracer_write_to_buffer(sWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); + + /* Do we log event details */ + if (ltt_test_bit(pmEventID, &sLogEventDetailsMask)) { + /* Write event structure */ + tracer_write_to_buffer(sWritPos, + pmEventStruct, + sEventStructSize[pmEventID]); + + /* Write string if any */ + if (lVarDataLen) + tracer_write_to_buffer(sWritPos, + lVarDataBeg, + lVarDataLen); + } + /* Write the length of the event description */ + tracer_write_to_buffer(sWritPos, + &lDataSize, + sizeof(lDataSize)); + + /* Should the tracing daemon be notified */ + if (lSendSignal == TRUE) { + /* Remember that a signal has been sent */ + sSignalSent = TRUE; + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + /* Setup signal information */ + lSigInfo.si_signo = SIGIO; + lSigInfo.si_errno = 0; + lSigInfo.si_code = SI_KERNEL; + + /* DEBUG */ +#if 0 + printk("<1> Sending SIGIO to %d \n", sDaemonTaskStruct->pid); +#endif + + /* Signal the tracing daemon */ + send_sig_info(SIGIO, &lSigInfo, sDaemonTaskStruct); + } else + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + return 0; +} + +/** + * tracer_switch_buffers: - Switches between read and write buffers. + * @pmTime: current time. + * + * Put the current write buffer to be read and reset put the old read + * buffer to be written to. Set the tracer variables in consequence. + * + * No return values. + * + * This should be called from within a spin_lock. + */ +void tracer_switch_buffers(struct timeval pmTime) +{ + char *lTempBuf; /* Temporary buffer pointer */ + char *lTempBufEnd; /* Temporary buffer end pointer */ + char *lInitWritPos; /* Initial write position */ + u8 lEventID; /* Event ID of last event */ + u8 lCPUID; /* CPUID of currently runing process */ + uint16_t lDataSize; /* Size of tracing data */ + u32 lSizeLost; /* Size delta between last event and end of buffer */ + trace_time_delta lTimeDelta; /* The time elapsed between now and the last event */ + trace_buffer_start lStartBufferEvent; /* Start of the new buffer event */ + + /* Remember initial write position */ + lInitWritPos = sWritPos; + + /* Write the end event at the write of the buffer */ + + /* Write the CPUID to the tracing buffer, if required */ + if (sLogCPUID == TRUE) { + lCPUID = smp_processor_id(); + tracer_write_to_buffer(sWritPos, + &lCPUID, + sizeof(lCPUID)); + } + /* Write event type to tracing buffer */ + lEventID = TRACE_EV_BUFFER_END; + tracer_write_to_buffer(sWritPos, + &lEventID, + sizeof(lEventID)); + + /* Write event time delta to tracing buffer */ + lTimeDelta = 0; + tracer_write_to_buffer(sWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); + + /* Get size lost */ + lSizeLost = sWritBufEnd - lInitWritPos; + + /* Write size lost at the end of the buffer */ + *((u32 *) (sWritBufEnd - sizeof(lSizeLost))) = lSizeLost; + + /* Switch buffers */ + lTempBuf = sReadBuf; + sReadBuf = sWritBuf; + sWritBuf = lTempBuf; + + /* Set buffer ends */ + lTempBufEnd = sReadBufEnd; + sReadBufEnd = sWritBufEnd; + sWritBufEnd = lTempBufEnd; + + /* Set read limit */ + sReadLimit = sReadBufEnd; + + /* Set write limit */ + sWritLimit = sWritBufEnd - TRACER_LAST_EVENT_SIZE; + + /* Set write position */ + sWritPos = sWritBuf; + + /* Increment buffer ID */ + sBufferID++; + + /* Set the time of begining of this buffer */ + sBufferStartTime = pmTime; + + /* Write the start of buffer event */ + lStartBufferEvent.ID = sBufferID; + lStartBufferEvent.Time = pmTime; + + /* Write event type to tracing buffer */ + lEventID = TRACE_EV_BUFFER_START; + tracer_write_to_buffer(sWritPos, + &lEventID, + sizeof(lEventID)); + + /* Write event time delta to tracing buffer */ + lTimeDelta = 0; + tracer_write_to_buffer(sWritPos, + &lTimeDelta, + sizeof(lTimeDelta)); + + /* Write event structure */ + tracer_write_to_buffer(sWritPos, + &lStartBufferEvent, + sizeof(lStartBufferEvent)); + + /* Compute the data size */ + lDataSize = sizeof(lEventID) + + sizeof(lTimeDelta) + + sizeof(lStartBufferEvent) + + sizeof(lDataSize); + + /* Write the length of the event description */ + tracer_write_to_buffer(sWritPos, + &lDataSize, + sizeof(lDataSize)); +} + +/** + * tracer_ioctl: - "Ioctl" file op + * + * @pmInode: the inode associated with the device + * @pmFile: file structure given to the acting process + * @pmCmd: command given by the caller + * @pmArg: arguments to the command + * + * Returns: + * >0, In case the caller requested the number of events lost. + * 0, Everything went OK + * -ENOSYS, no such command + * -EINVAL, tracer not properly configured + * -EBUSY, tracer can't be reconfigured while in operation + * -ENOMEM, no more memory + * -EFAULT, unable to access user space memory + * + * Note: + * In the future, this function should check to make sure that it's the + * server that make thes ioctl. + */ +int tracer_ioctl(struct inode *pmInode, + struct file *pmFile, + unsigned int pmCmd, + unsigned long pmArg) +{ + int lRetValue; /* Function return value */ + int lDevMinor; /* Device minor number */ + int lNewUserEventID; /* ID of newly created user event */ + trace_start lStartEvent; /* Event marking the begining of the trace */ + unsigned long int lFlags; /* CPU flags for lock */ + trace_custom lUserEvent; /* The user event to be logged */ + trace_change_mask lTraceMask; /* Event mask */ + trace_new_event lNewUserEvent; /* The event to be created for the user */ + trace_buffer_start lStartBufferEvent; /* Start of the new buffer event */ + + /* Get device's minor number */ + lDevMinor = minor(pmInode->i_rdev) & 0x0f; + + /* If the tracer is started, the daemon can't modify the configuration */ + if ((lDevMinor == 0) + && (sTracerStarted == TRUE) + && (pmCmd != TRACER_STOP) + && (pmCmd != TRACER_DATA_COMITTED) + && (pmCmd != TRACER_GET_BUFFER_CONTROL)) + return -EBUSY; + + /* Only some operations are permitted to user processes trying to log events */ + if ((lDevMinor == 1) + && (pmCmd != TRACER_CREATE_USER_EVENT) + && (pmCmd != TRACER_DESTROY_USER_EVENT) + && (pmCmd != TRACER_TRACE_USER_EVENT) + && (pmCmd != TRACER_SET_EVENT_MASK) + && (pmCmd != TRACER_GET_EVENT_MASK)) + return -ENOSYS; + + /* Depending on the command executed */ + switch (pmCmd) { + /* Start the tracer */ + case TRACER_START: + /* Initialize buffer control regardless of scheme in use */ + init_buffer_control(&sBufferControl, + !sUseLocking, /* using_lockless */ + sBufnoBits, /* bufno_bits, 2**n */ + sBufOffsetBits); /* offset_bits, 2**n */ + + /* Check if the device has been properly set up */ + if (((sUseSyscallEIPBounds == TRUE) + && (sSyscallEIPDepthSet == TRUE)) + || ((sUseSyscallEIPBounds == TRUE) + && ((sLowerEIPBoundSet != TRUE) + || (sUpperEIPBoundSet != TRUE))) + || ((sTracingPID == TRUE) + && (sTracingPGRP == TRUE))) + return -EINVAL; + + /* Set the kernel-side trace configuration */ + if (trace_set_config(trace, + sSyscallEIPDepthSet, + sUseSyscallEIPBounds, + sSyscallEIPDepth, + sLowerEIPBound, + sUpperEIPBound) < 0) + return -EINVAL; + + /* Always log the start event and the buffer start event */ + ltt_set_bit(TRACE_EV_BUFFER_START, &sTracedEvents); + ltt_set_bit(TRACE_EV_BUFFER_START, &sLogEventDetailsMask); + ltt_set_bit(TRACE_EV_START, &sTracedEvents); + ltt_set_bit(TRACE_EV_START, &sLogEventDetailsMask); + ltt_set_bit(TRACE_EV_CHANGE_MASK, &sTracedEvents); + ltt_set_bit(TRACE_EV_CHANGE_MASK, &sLogEventDetailsMask); + + /* Get the time of start */ + do_gettimeofday(&sBufferStartTime); + + /* Set the event description */ + lStartBufferEvent.ID = sBufferID; + lStartBufferEvent.Time = sBufferStartTime; + + /* Set the event description */ + lStartEvent.MagicNumber = TRACER_MAGIC_NUMBER; + lStartEvent.ArchType = TRACE_ARCH_TYPE; + lStartEvent.ArchVariant = TRACE_ARCH_VARIANT; + lStartEvent.SystemType = TRACE_SYS_TYPE_VANILLA_LINUX; + lStartEvent.MajorVersion = TRACER_VERSION_MAJOR; + lStartEvent.MinorVersion = TRACER_VERSION_MINOR; + lStartEvent.BufferSize = sBufSize; + lStartEvent.EventMask = sTracedEvents; + lStartEvent.DetailsMask = sLogEventDetailsMask; + lStartEvent.LogCPUID = sLogCPUID; + + /* Trace the buffer start event using the appropriate method depending on the locking scheme */ + if(sBufferControl.using_lockless == TRUE) + write_start_buffer_event(sBufferControl.index & sBufferControl.index_mask, + sBufferStartTime); + else + trace(TRACE_EV_BUFFER_START, &lStartBufferEvent); + + /* Trace the start event */ + trace(TRACE_EV_START, &lStartEvent); + + /* Start tapping into Linux's syscall flow */ + syscall_entry_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_ENTRY, &sTracedEvents); + syscall_exit_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_EXIT, &sTracedEvents); + + /* We can start tracing */ + sTracerStarted = TRUE; + + /* Reregister custom trace events created earlier */ + trace_reregister_custom_events(); + break; + + /* Stop the tracer */ + case TRACER_STOP: + /* Stop tracing */ + /* We don't log new events, but old lockless ones can finish */ + sTracerStarted = FALSE; + + /* Stop interrupting the normal flow of system calls */ + syscall_entry_trace_active = 0; + syscall_exit_trace_active = 0; + + /* Make sure the last buffer touched is finalized */ + if(sBufferControl.using_lockless) { + /* Write end buffer event as last event in old buf. */ + finalize_lockless_trace(); + break; + } /* Else locking scheme */ + + /* Acquire the lock to avoid SMP case of where another CPU is writing a trace + while buffer is being switched */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* Switch the buffers to ensure that the end of the buffer mark is set (time isn't important) */ + tracer_switch_buffers(sBufferStartTime); + + /* Release lock */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + break; + + /* Set the tracer to the default configuration */ + case TRACER_CONFIG_DEFAULT: + tracer_set_default_config(); + break; + + /* Set the memory buffers the daemon wants us to use */ + case TRACER_CONFIG_MEMORY_BUFFERS: + /* Is the given size "reasonable" */ + if (sUseLocking == TRUE) { + if (pmArg < TRACER_MIN_BUF_SIZE) + return -EINVAL; + } else { + if ((pmArg < TRACER_LOCKLESS_MIN_BUF_SIZE) || + (pmArg > TRACER_LOCKLESS_MAX_BUF_SIZE)) + return -EINVAL; + } + + /* Set the buffer's size */ + return tracer_set_buffer_size(pmArg); + break; + + /* Set the number of memory buffers the daemon wants us to use */ + case TRACER_CONFIG_N_MEMORY_BUFFERS: + /* Is the given size "reasonable" */ + if ((sUseLocking == TRUE) || (pmArg < TRACER_MIN_BUFFERS) || + (pmArg > TRACER_MAX_BUFFERS)) + return -EINVAL; + + /* Set the number of buffers */ + return tracer_set_n_buffers(pmArg); + break; + + /* Set locking scheme the daemon wants us to use */ + case TRACER_CONFIG_USE_LOCKING: + /* Set the locking scheme in a global for later */ + sUseLocking = pmArg; +#if !(CONFIG_LOCKLESS_TRACE && __HAVE_ARCH_CMPXCHG) + if(sUseLocking == FALSE) /* Trying to use lock-free scheme */ + /* Lock-free scheme not supported on this platform */ + return -EINVAL; +#endif + break; + + /* Trace the given events */ + case TRACER_CONFIG_EVENTS: + if (copy_from_user(&sTracedEvents, (void *) pmArg, sizeof(sTracedEvents))) + return -EFAULT; + break; + + /* Record the details of the event, or not */ + case TRACER_CONFIG_DETAILS: + if (copy_from_user(&sLogEventDetailsMask, (void *) pmArg, sizeof(sLogEventDetailsMask))) + return -EFAULT; + break; + + /* Record the CPUID associated with the event */ + case TRACER_CONFIG_CPUID: + sLogCPUID = TRUE; + break; + + /* Trace only one process */ + case TRACER_CONFIG_PID: + sTracingPID = TRUE; + sTracedPID = pmArg; + break; + + /* Trace only the given process group */ + case TRACER_CONFIG_PGRP: + sTracingPGRP = TRUE; + sTracedPGRP = pmArg; + break; + + /* Trace the processes of a given group of users */ + case TRACER_CONFIG_GID: + sTracingGID = TRUE; + sTracedGID = pmArg; + break; + + /* Trace the processes of a given user */ + case TRACER_CONFIG_UID: + sTracingUID = TRUE; + sTracedUID = pmArg; + break; + + /* Set the call depth a which the EIP should be fetched on syscall */ + case TRACER_CONFIG_SYSCALL_EIP_DEPTH: + sSyscallEIPDepthSet = TRUE; + sSyscallEIPDepth = pmArg; + break; + + /* Set the lowerbound address from which EIP is recorded on syscall */ + case TRACER_CONFIG_SYSCALL_EIP_LOWER: + /* We are using bounds for fetching the EIP where syscall was made */ + sUseSyscallEIPBounds = TRUE; + + /* Set the lower bound */ + sLowerEIPBound = (void *) pmArg; + + /* The lower bound has been set */ + sLowerEIPBoundSet = TRUE; + break; + + /* Set the upperbound address from which EIP is recorded on syscall */ + case TRACER_CONFIG_SYSCALL_EIP_UPPER: + /* We are using bounds for fetching the EIP where syscall was made */ + sUseSyscallEIPBounds = TRUE; + + /* Set the upper bound */ + sUpperEIPBound = (void *) pmArg; + + /* The upper bound has been set */ + sUpperEIPBoundSet = TRUE; + break; + + /* The daemon has comitted the last trace */ + case TRACER_DATA_COMITTED: +#if 0 + /* DEBUG */ + printk("Tracer: Data has been committed \n"); +#endif + + /* The lockless version doesn't use sSignalSent. pmArg is the + number of buffers the daemon has told us it just consumed. + Add that to the global count. */ + if(sBufferControl.using_lockless) { + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* We consumed some buffers, note it. */ + sBufferControl.buffers_consumed += (u32)pmArg; + + /* If we were full, we no longer are */ + if(sBuffersFull && ((u32)pmArg > 0)) + continue_trace(); + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + break; + } /* Else locking version below */ + + /* Safely set the signal sent flag to FALSE */ + spin_lock_irqsave(&sSpinLock, lFlags); + sSignalSent = FALSE; + spin_unlock_irqrestore(&sSpinLock, lFlags); + break; + + /* Get the number of events lost */ + case TRACER_GET_EVENTS_LOST: + return sEventsLost; + break; + + /* Create a user event */ + case TRACER_CREATE_USER_EVENT: + /* Copy the information from user space */ + if (copy_from_user(&lNewUserEvent, (void *) pmArg, sizeof(lNewUserEvent))) + return -EFAULT; + + /* Create the event */ + lNewUserEventID = trace_create_owned_event(lNewUserEvent.type, + lNewUserEvent.desc, + lNewUserEvent.format_type, + lNewUserEvent.form, + current->pid); + + /* Has the operation succeded */ + if (lNewUserEventID >= 0) { + /* Set the event ID */ + lNewUserEvent.id = lNewUserEventID; + + /* Copy the event information back to user space */ + if (copy_to_user((void *) pmArg, &lNewUserEvent, sizeof(lNewUserEvent))) { + /* Since we were unable to tell the user about the event, destroy it */ + trace_destroy_event(lNewUserEventID); + return -EFAULT; + } + } else + /* Forward trace_create_event()'s error code */ + return lNewUserEventID; + break; + + /* Destroy a user event */ + case TRACER_DESTROY_USER_EVENT: + /* Pass on the user's request */ + trace_destroy_event((int) pmArg); + break; + + /* Trace a user event */ + case TRACER_TRACE_USER_EVENT: + /* Copy the information from user space */ + if (copy_from_user(&lUserEvent, (void *) pmArg, sizeof(lUserEvent))) + return -EFAULT; + + /* Copy the user event data */ + if (copy_from_user(sUserEventData, lUserEvent.data, lUserEvent.data_size)) + return -EFAULT; + + /* Log the raw event */ + lRetValue = trace_raw_event(lUserEvent.id, + lUserEvent.data_size, + sUserEventData); + + /* Has the operation failed */ + if (lRetValue < 0) + /* Forward trace_create_event()'s error code */ + return lRetValue; + break; + + /* Set event mask */ + case TRACER_SET_EVENT_MASK: + /* Copy the information from user space */ + if (copy_from_user(&(lTraceMask.mask), (void *) pmArg, sizeof(lTraceMask.mask))) + return -EFAULT; + + /* Trace the event */ + lRetValue = trace(TRACE_EV_CHANGE_MASK, &lTraceMask); + + /* Change the event mask. (This has to be done second or else may loose the + information if the user decides to stop logging "change mask" events) */ + memcpy(&sTracedEvents, &(lTraceMask.mask), sizeof(lTraceMask.mask)); + syscall_entry_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_ENTRY, &sTracedEvents); + syscall_exit_trace_active = ltt_test_bit(TRACE_EV_SYSCALL_EXIT, &sTracedEvents); + + /* Always trace the buffer start, the trace start and the change mask */ + ltt_set_bit(TRACE_EV_BUFFER_START, &sTracedEvents); + ltt_set_bit(TRACE_EV_START, &sTracedEvents); + ltt_set_bit(TRACE_EV_CHANGE_MASK, &sTracedEvents); + + /* Forward trace()'s error code */ + return lRetValue; + break; + + /* Get event mask */ + case TRACER_GET_EVENT_MASK: + /* Copy the information to user space */ + if (copy_to_user((void *) pmArg, &sTracedEvents, sizeof(sTracedEvents))) + return -EFAULT; + break; + + /* Get buffer control data */ + case TRACER_GET_BUFFER_CONTROL: + /* We can't copy_to_user() with a lock held (accessing user + memory may cause a page fault), so buffers_produced may + actually be larger than what the daemon sees when this + snapshot is taken. This isn't a problem because the + daemon will get a chance to read the new buffer the next + time it's signaled. */ + /* Copy the buffer control information to user space */ + if(copy_to_user((void *) pmArg, &sBufferControl, sizeof(sBufferControl))) + return -EFAULT; + break; + + /* Unknown command */ + default: + return -ENOSYS; + } + + return 0; +} + +/** + * tracer_mmap: - "Mmap" file op + * @pmInode: the inode associated with the device + * @pmFile: file structure given to the acting process + * @pmVmArea: Virtual memory area description structure + * + * Returns: + * 0 if ok + * -EAGAIN, when remap failed + * -EACCESS, permission denied + */ +int tracer_mmap(struct file *pmFile, + struct vm_area_struct *pmVmArea) +{ + int lRetValue; /* Function's return value */ + + /* Only the trace daemon is allowed access to mmap */ + if (current != sDaemonTaskStruct) + return -EACCES; + + /* Remap trace buffer into the process's memory space */ + lRetValue = tracer_mmap_region(pmVmArea, + (char *) pmVmArea->vm_start, + sTracBuf, + pmVmArea->vm_end - pmVmArea->vm_start); + +#if 0 + printk("Tracer: Trace buffer virtual address => 0x%08X \n", (u32) sTracBuf); + printk("Tracer: Trace buffer physical address => 0x%08X \n", (u32) virt_to_phys(sTracBuf)); + printk("Tracer: Trace buffer virtual address in daemon space => 0x%08X \n", (u32) pmVmArea->vm_start); + printk("Tracer: Trace buffer physical address in daemon space => 0x%08X \n", (u32) virt_to_phys((void *) pmVmArea->vm_start)); +#endif + + return lRetValue; +} + +/** + * tracer_open(): - "Open" file op + * @pmInode: the inode associated with the device + * @pmFile: file structure given to the acting process + * + * Returns: + * 0, everything went OK + * -ENODEV, no such device. + * -EBUSY, daemon channel (minor number 0) already in use. + */ +int tracer_open(struct inode *pmInode, + struct file *pmFile) +{ + int lDevMinor = minor(pmInode->i_rdev) & 0x0f; /* Device minor number */ + + /* Only minor number 0 and 1 are used */ + if ((lDevMinor > 0) && (lDevMinor != 1)) + return -ENODEV; + + /* If the device has already been opened */ + if (sOpenCount) { + /* Is there another process trying to open the daemon's channel (minor number 0) */ + if (lDevMinor == 0) + return -EBUSY; + else + /* Only increment use, this is just another user process trying to log user events */ + goto IncrementUse; + } + /* Fetch the task structure of the process that opened the device */ + sDaemonTaskStruct = current; + + /* Reset the default configuration since this is the daemon and he will complete the setup */ + tracer_set_default_config(); + +#if 0 + /* DEBUG */ + printk("<1>Process %d opened the tracing device \n", sDaemonTaskStruct->pid); +#endif + +IncrementUse: + /* Lock the device */ + sOpenCount++; + +#ifdef MODULE + /* Increment module usage */ + MOD_INC_USE_COUNT; +#endif + + return 0; +} + +/** + * tracer_release: - "Release" file op + * @pmInode: the inode associated with the device + * @pmFile: file structure given to the acting process + * + * Returns: + * 0, everything went OK + * -EBUSY, there are still event writes in progress so the buffer can't + * be released. + * + * Note: + * It is assumed that if the tracing daemon dies, exits or simply stops + * existing, the kernel or "someone" will call tracer_release. Otherwise, + * we're in trouble ... + */ +int tracer_release(struct inode *pmInode, + struct file *pmFile) +{ + int lCount; + int lDevMinor = minor(pmInode->i_rdev) & 0x0f; /* Device minor number */ + + /* Is this a simple user process exiting? */ + if (lDevMinor != 0) + goto DecrementUse; + + /* Did we loose any events */ + if (sEventsLost > 0) + printk(KERN_ALERT "Tracer: Lost %d events \n", sEventsLost); + + /* Reset the daemon PID */ + sDaemonTaskStruct = NULL; + + /* Free the current buffers, if any, but only if they're not still + in use */ + if (sTracBuf != NULL) { + lCount = trace_get_pending_write_count(); + if(lCount == 0) + rvfree(sTracBuf, sAllocSize); + else { + printk(KERN_ERR "Tracer: Couldn't release tracer - %d event writes pending \n", + lCount); + return -EBUSY; + } + } + + /* Reset the read and write buffers */ + sTracBuf = NULL; + sWritBuf = NULL; + sReadBuf = NULL; + sWritBufEnd = NULL; + sReadBufEnd = NULL; + sWritPos = NULL; + sReadLimit = NULL; + sWritLimit = NULL; + sUseLocking = TRUE; + + /* Reset the tracer's configuration */ + tracer_set_default_config(); + sTracerStarted = FALSE; + + /* Reset number of bytes recorded and number of events lost */ + sBufReadComplete = 0; + sSizeReadIncomplete = 0; + sEventsLost = 0; + + /* Reset signal sent */ + sSignalSent = FALSE; + +DecrementUse: + /* Unlock the device */ + sOpenCount--; + +#ifdef MODULE + /* Decrement module usage */ + MOD_DEC_USE_COUNT; +#endif + + return 0; +} + +/** + * tracer_fsync: - "Fsync" file op + * @pmFile: file structure given to the acting process + * @pmDEntry: dentry associated with file + * + * Returns: + * 0, everything went OK + * -EACCESS, permission denied + * + * Note: + * We need to look the modifications of the values because they are read + * and written by trace(). + */ +int tracer_fsync(struct file *pmFile, + struct dentry *pmDEntry, + int pmDataSync) +{ + unsigned long int lFlags; + + /* Only the trace daemon is allowed access to fsync */ + if (current != sDaemonTaskStruct) + return -EACCES; + + /* Lock the kernel */ + spin_lock_irqsave(&sSpinLock, lFlags); + + /* Reset the write positions */ + sWritPos = sWritBuf; + + /* Reset read limit */ + sReadLimit = sReadBuf; + + /* Reset bytes recorded */ + sBufReadComplete = 0; + sSizeReadIncomplete = 0; + sEventsLost = 0; + + /* Reset signal sent */ + sSignalSent = FALSE; + + /* Unlock the kernel */ + spin_unlock_irqrestore(&sSpinLock, lFlags); + + return 0; +} + +/** + * tracer_set_buffer_size: - Sets the size of the buffers. + * @pmSize: Size of buffers + * + * Returns: + * 0, Size setting went OK + * -ENOMEM, unable to get a hold of memory for tracer + * + * sBufnoBits must have already been set before this function is called. + */ +int tracer_set_buffer_size(int pmSize) +{ + int lSizeAlloc; + int lNBuffers = TRACE_MAX_BUFFER_NUMBER(sBufnoBits); + + if(sUseLocking == TRUE) + /* Set size to allocate (= pmSize * 2) and fix it's size to be on a page boundary */ + lSizeAlloc = FIX_SIZE(pmSize << 1); + else { + /* Calculate power-of-2 buffer size */ + if(hweight32(pmSize) != 1) + /* Invalid if # set bits != 1 */ + return -EINVAL; + + /* Find position of one and only set bit */ + sBufOffsetBits = ffs(pmSize) - 1; + + /* Calculate total size of buffers */ + lSizeAlloc = pmSize * lNBuffers; + + /* Sanity check */ + if(lSizeAlloc > TRACER_LOCKLESS_MAX_TOTAL_BUF_SIZE) + return -EINVAL; + } + + /* Free the current buffers, if any, but only if they're not still in use */ + if (sTracBuf != NULL) { + if(trace_get_pending_write_count() == 0) + rvfree(sTracBuf, sAllocSize); + else + return -EBUSY; + } + + /* Allocate space for the tracing buffers */ + if ((sTracBuf = (char *) rvmalloc(lSizeAlloc)) == NULL) + return -ENOMEM; + +#if 0 /* DEBUG - init all of buffer with easy-to-spot default values */ + { + int i; + for(i=0; i> (offset_bits)) +#define TRACE_BUFFER_OFFSET_GET(index, mask) ((index) & (mask)) +#define TRACE_BUFFER_OFFSET_CLEAR(index, mask) ((index) & ~(mask)) + +/* Flags returned by trace_reserve/trace_reserve_slow */ +#define LTT_BUFFER_SWITCH_NONE 0x00 +#define LTT_EVENT_DISCARD_NONE 0x00 +#define LTT_BUFFER_SWITCH 0x01 +#define LTT_EVENT_DISCARD 0x02 +#define LTT_EVENT_TOO_LONG 0x04 + +/* Structure used for communicating buffer info between tracer and daemon + for lock-free tracing. This is a per-buffer (CPU, etc.) data structure. */ +struct buffer_control +{ + int using_lockless; + u32 index; + u8 bufno_bits; + u32 n_buffers; /* cached value */ + u8 offset_bits; + u32 offset_mask; /* cached value */ + u32 index_mask; /* cached value */ + + u32 buffers_produced; + u32 buffers_consumed; +#if CONFIG_LOCKLESS_TRACE + /* atomic_t has only 24 usable bits, limiting us to 16M buffers */ + atomic_t fill_count[TRACER_MAX_BUFFERS]; +#endif /* CONFIG_LOCKLESS_TRACE */ +}; + +/* If cmpxchg isn't defined for the architecture, we don't want to + generate a link error - the locking scheme will still be available. */ +#ifndef __HAVE_ARCH_CMPXCHG +#define cmpxchg(p,o,n) 0 +#endif + +extern __inline__ int ltt_set_bit(int nr, void *addr) +{ + unsigned char *p = addr; + unsigned char mask = 1 << (nr & 7); + unsigned char old; + + p += nr >> 3; + old = *p; + *p |= mask; + + return ((old & mask) != 0); +} + +extern __inline__ int ltt_clear_bit(int nr, void *addr) +{ + unsigned char *p = addr; + unsigned char mask = 1 << (nr & 7); + unsigned char old; + + p += nr >> 3; + old = *p; + *p &= ~mask; + + return ((old & mask) != 0); +} + +extern __inline__ int ltt_test_bit(int nr, void *addr) +{ + unsigned char *p = addr; + unsigned char mask = 1 << (nr & 7); + + p += nr >> 3; + + return ((*p & mask) != 0); +} + +/* Function prototypes */ +int trace + (u8, + void *); +void tracer_switch_buffers + (struct timeval); +int tracer_ioctl + (struct inode *, + struct file *, + unsigned int, + unsigned long); +int tracer_mmap + (struct file *, + struct vm_area_struct *); +int tracer_open + (struct inode *, + struct file *); +int tracer_release + (struct inode *, + struct file *); +int tracer_fsync + (struct file *, + struct dentry *, + int); +#ifdef MODULE +void tracer_exit + (void); +#endif /* #ifdef MODULE */ +int tracer_set_buffer_size + (int); +int tracer_set_n_buffers + (int); +int tracer_set_default_config + (void); +int tracer_init + (void); +#endif /* _TRACER_H */ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/