2009-06-11 16:03:54

by Ingo Molnar

[permalink] [raw]
Subject: [GIT PULL] Performance Counters for Linux

Linus,

Please consider pulling the performance counters Git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git perfcounters-for-linus

The -v8 version was announced at:

http://lwn.net/Articles/336542/

[ Changes since -v8: cleanups of the ABI, broader cachemiss-counter
enumeration support, Power7 arch support, Jato JIT dynamic symbols
support, improved auto-freq sampling, gentler fall-back in tools
in case of missing PMU features, and general tidyups. ]

Performance Counters for Linux is a new subsystem that offers
unified handling and tooling for all things performance analysis: it
provides an fd based 'counter' abstraction to expose a wide set of
performance measurement features:

- The counting of hardware and software events

- Sampling/tracing of events

- Various counter-scheduling and workload-measurement features (per
task, per CPU, per child hierarchy, etc.)

- Self and remote measurements

- CPU-independent abstraction/enumeration of hardware and software
events that tries to distill a more or less uniformly available
set of events.

Perfcounters are supported on PowerPC and on x86 (sw counters are
unconditionally supported, hw counters are supported on Intel Core2
and later, and on most AMD CPUs).

The subsystem also offers the 'perf' tool in tools/perf/, which is
an integrated collection of subcommands that allow various levels of
inspection and analysis - form the ten-miles-high statistics to the
per assembly statistics and to the raw trace itself.

The tooling tries to be friendly, intuitive and very fast - and does
not try to get in the way of getting work done. Its main focus is on
measuring/profilig user-space apps, without an artificial separation
of 'kernel-space' and 'user-space' activities. There's ELF symbol
and debuginfo/annotation support.

The counter concept got objected to in past discussions on lkml, by
DaveM and by Stephane Eranian (i've Cc:-ed them) - so this code was
not eligible for linux-next testing - nevertheless we gave it good
testing on PowerPC and x86 and i've done a wide cross-build test as
well to try to make sure it breaks no other architecture.

Any architecture can add support for perfcounters it by hooking up
the syscall - without any PMU feature - the software events and
hrtimer based sampling will be available and all the tools will work
transparently.

I tested its merge interaction with pending trees in linux-next: it
only conflicted with the scheduler and x86 trees (which trees i
already sent earlier today), and a trivial conflict with two pending
cleanup patches in atomic.h. I.e. the interaction is minimal.

CONFIG_PERF_COUNTER is default-disabled.

The tree shows some more frequent merge commits a few months ago -
later portions were kept clean as the other -tip branches. There's
one stray looking (and harmless) commit i noticed:

b09d250: mutex: drop "inline" from mutex_lock() inside kernel/mutex.c

This got there due the atomic_dec_and_mutex_lock() API being first
used in and originating from perfcounters. That API went upstream in
the previous merge window already, it was cherry-picked out of the
perfcounters tree, the upstream commit is:

b1fca26: mutex: add atomic_dec_and_mutex_lock()

There's also interactions with the irqinit unification which went
upstream yesterday. (There were so many non-trivial conflicts that i
pre-merged your current latest tree into this tree and tested the
result.)

Please pull if you find it acceptable.

Thanks,

Ingo

------------------>

Arjan van de Ven (2):
perf_counter tools: Warning fixes on 32-bit
perf_counter tools: Initialize a stack variable before use

Arnaldo Carvalho de Melo (26):
perf record: Allow specifying a pid to record
perf_counter: First part of 'perf report' conversion to C + elfutils
perf_counter: Implement dso__load using libelf
perf_counter: Use rb_trees in perf report
perf_counter: Add our private copy of list.h
perf_counter: Use rb_tree for symhists and threads in report
perf report: Fix kernel symbol resolution
perf: Don't assume /proc/kallsyms is ordered
perf report: Sort output by symbol usage
perf report: Use hex2long instead of sscanf
perf report: Only load text symbols from kallsyms
perf report: Show the IP only in --verbose mode
perf_counter tools: Move symbol resolution classes from report to libperf
perf_counter tools: struct symbol priv area
perf_counter tools: Consolidate dso methods to load kernel symbols
perf_counter tools: Optionally pass a symbol filter to the dso load routines
perf_counter tools: Convert builtin-top to use libperf symbol routines
perf_counter tools: Shorten the DSO names using cwd
perf_counter tools: Add locking to perf top
perf_counter tools: Add string.[ch]
perf_counter tools: Use hex2u64 in more places
perf_counter tools: Add missing rb_erase in dso__delete_symbols
perf_counter tools: Cover PLT symbols too
perf_counter tools: Fix off-by-one bug in symbol__new
perf report: Fix rbtree bug
perf report: Add -vvv to print the list of threads and its mmaps

Erdem Aktas (1):
perf_counter tools: fix buffer overwrite problem for perf top command

Eric Paris (1):
mutex: add atomic_dec_and_mutex_lock()

Frederic Weisbecker (4):
perf_counter: Sleep before refresh using poll in perf top
perf_counter tools: Fix warn_unused_result warnings
perf top: Fix zero or negative refresh delay
perf top: Wait for a minimal set of events before reading first snapshot

H. Peter Anvin (1):
mutex: drop "inline" from mutex_lock() inside kernel/mutex.c

Hidetoshi Seto (1):
x86: smarten /proc/interrupts output for new counters

Ingo Molnar (194):
performance counters: documentation
performance counters: x86 support
x86, perfcounters: read out MSR_CORE_PERF_GLOBAL_STATUS with counters disabled
perfcounters: select ANON_INODES
perfcounters, x86: simplify disable/enable of counters
perfcounters, x86: clean up debug code
perfcounters: consolidate global-disable codepaths
perf counters: restructure the API
perf counters: add support for group counters
perf counters: group counter, fixes
perf counters: hw driver API
perf counters: implement PERF_COUNT_CPU_CLOCK
perf counters: consolidate hw_perf save/restore APIs
perf counters: implement PERF_COUNT_TASK_CLOCK
perf counters: add prctl interface to disable/enable counters
perf counters: clean up state transitions
perf counters: update docs
x86: implement atomic64_t on 32-bit
perfcounters: restructure x86 counter math
perfcounters: implement "counter inheritance"
perfcounters: fix task clock counter
perfcounters: add context switch counter
perfcounters: add task migrations counter
perfcounters: add nr-of-faults counter
perfcounters: fix non-intel-perfmon CPUs
perfcounters, x86: fix sw counters on non-PMC CPUs
perfcounters: fix lapic initialization
perfcounters: release CPU context when exiting task counters
perfcounters: flush on setuid exec
perfcounters: use hw_event.disable flag
perfcounters: remove warnings
perfcounters: tweak group scheduling
x86, perfcounters: rename intel_arch_perfmon.h => perf_counter.h
x86, perfcounters: prepare for fixed-mode PMCs
perfcounters: add fixed-mode PMC enumeration
x86, perfcounters: refactor code for fixed-function PMCs
perfcounters: hw ops rename
perfcounters: fix task clock counter
perfcounters: pull inherited counters
perfcounters: fix init context lock
perfcounters: enable lowlevel pmc code to schedule counters
x86, perfcounters: print out the ->used bitmask
perfcounters: remove ->nr_inherited
perfcounters: generalize the counter scheduler
perfcounters: add PERF_COUNT_BUS_CYCLES
x86, perfcounters: add support for fixed-function pmcs
perfcounters: include asm/perf_counter.h only if CONFIG_PERF_COUNTERS=y
perfcounters: fix "perf counters kills oprofile" bug, v2
perfcounters: remove duplicate definition of LOCAL_PERF_VECTOR
perfcounters: fix acpi_idle_do_entry() workaround
perfcounters: fix reserved bits sizing
perf_counter: fix crash on perfmon v1 systems
perf_counter: create Documentation/perf_counter/ and move perfcounters.txt there
perf_counter: add sample user-space to Documentation/perf_counter/
perf_counter tools: tidy up in-kernel dependencies
perf_counter tools: fix build warning in kerneltop.c
perf_counter tools: increase cpu-cycles again
x86, perfcounters: add atomic64_xchg()
perf_counter: fix off task->comm by one
perf_counter tools: include PID in perf-report output, tweak user/kernel printut
perf_counter: copy in Git's top Makefile
perf_counter tools: add in basic glue from Git
perf_counter tools: clean up after introduction of the Git command framework
perf_counter tools: separate kerneltop into 'perf top' and 'perf stat'
perf_counter tools: add help texts
perf_counter tools: add 'perf record' command
perf_counter tools: fix --version
perf_counter tools: add 'perf help'
perf_counter tools: fix 'make install'
perfcounters, sched: remove __task_delta_exec()
perf_counter tools: move helper library to util/*
perf_counter: add/update copyrights
perf_counter tools: add perf-report to the Makefile
perf_counter tools: perf stat: make -l default-on
perf_counter tools: fix infinite loop in perf-report on zeroed event records
perf_counter tools: fix x86 syscall numbers
perf_counter: round-robin per-CPU counters too
perf_counter: initialize the per-cpu context earlier
perf_counter: convert perf_resource_mutex to a spinlock
perf_counter: fix fixed-purpose counter support on v2 Intel-PERFMON
perf_counter tools: remove debug code from builtin-stat.c
perf_counter: x86: Fix throttling
perf_counter: x86: Disallow interval of 1
perf_counter: x86: Protect against infinite loops in intel_pmu_handle_irq()
perf_counter: Remove ACPI quirk
perf stat: handle Ctrl-C
perf_counter: fix threaded task exit
perf_counter, x86: fix zero irq_period counters
perf_counter, x86: speed up the scheduling fast-path
perf_counter: fix counter freeing logic
perf_counter: fix counter inheritance race
perf_counter: Fix context removal deadlock
perf_counter: fix !PERF_COUNTERS build failure
perf_counter tools: increase limits
perf_counter: Increase mmap limit
perf_counter tools: increase limits, fix
perf_counter: Move child perfcounter init to after scheduler init
perf stat: flip around ':k' and ':u' flags
Revert "perf_counter, x86: speed up the scheduling fast-path"
perf_counter: fix warning & lockup
perf_counter, x86: Fix APIC NMI programming
perf_counter, x86: Make NMI lockups more robust
perf_counter: Initialize ->oncpu properly
perf record: Straighten out argv types
perf stat: Remove unused variable
perf record: Convert to Git option parsing
perf_counter tools: Librarize event string parsing
perf stat: Convert to Git option parsing
perf top: Convert to Git option parsing
perf_counter tools: remove the standalone perf-report utility
perf record: Convert to Git option parsing
perf report: Add help/manpage
perf report: add --dump-raw-trace option
perf report: add counter for unknown events
perf report: add more debugging
perf report: Only load text symbols from kallsyms, fix
perf_counter tools: Introduce stricter C code checking
perf_counter tools: Rename output.perf to perf.data
perf_counter tools: Add built-in pager support
perf report: Remove <ctype.h> include
pref_counter: tools: report: Add header printout & prettify
pref_counter: tools: report: Robustify in case of weird events
perf_counter: Fix perf_counter_init_task() on !CONFIG_PERF_COUNTERS
perf_counter tools: report: Add help text for --sort
perf_counter tools: Clean up builtin-stat.c's do_perfstat()
perf_counter tools: Split display into reading and printing
perf_counter tools: Also display time-normalized stat results
perf_counter: Fix cpuctx->task_ctx races
perf_counter: Robustify counter-free logic
perf_counter tools: Print 'CPU utilization factor' in builtin-stat
perf_counter tools: Fix 'make install'
perf_counter tools: Generate per command manpages (and pdf/html, etc.)
perf_counter tools: Fix unknown command help text
perf_counter: Tidy up style details
perf report: Clean up the default output
perf report: Fix column width/alignment of dsos
perf record: Add --append option
perf record: Increase mmap buffering default
perf report: Print more info instead of <unknown> entries
perf_counter tools: Make source code headers more coherent
perf record: Print out the number of events captured
perf report: Print -D to stdout
perf report: Improve sort key recognition
perf report: Handle vDSO symbols properly
perf_counter tools: Clean up old kerneltop references
perf record: Refine capture printout
perf report: Display 100% correctly
perf stat: Print out all arguments
perf report: Add front-entry cache for lookups
perf help: Fix bug when there's no perf-* command around
perf_counter tools: Optimize harder
perf_counter tools: Work around warnings in older GCCs
perf_counter: Fix throttling lock-up
perf report: Clean up event processing
perf report: Split out event processing helpers
perf report: Handle all known event types
perf top: Reduce default filter threshold
perf record/report: Fix PID/COMM handling
perf_counter tools: Build with native optimization
perf_counter tools: Print out symbol parsing errors only if --verbose
perf report: Print out the total number of events
perf_counter tools: Add color terminal output support
perf_counter tools: Dont output in color on !tty
perf report: Bail out if there are unrecognized options/arguments
perf stat: Update help text
perf record: Split out counter creation into a helper function
perf record, top: Implement --freq
perf report: Display user/kernel differentiator
perf_counter tools: Clarify events/samples naming
perf_counter tools: Remove -march=native
perf_counter tools: Sample and display frequency adjustment changes
perf record: Set frequency correctly
perf_counter: Separate out attr->type from attr->config
perf_counter: Implement generalized cache event types
perf_counter tools: Fix cache-event printout
perf_counter tools: Uniform help printouts
perf_counter tools: Tidy up manpage details
perf_counter tools: Prepare for 'perf annotate'
perf_counter tools: Add 'perf annotate' feature
perf_counter tools: Move from Documentation/perf_counter/ to tools/perf/
perf_counter tools: Fix error condition in parse_aliases()
perf annotate: Automatically pick up vmlinux in the local directory
perf annotate: Fix command line help text
perf stat: Continue even on counter creation error
perf top: Fall back to cpu-clock-tick hrtimer sampling if no cycle counter available
perf record: Fall back to cpu-clock-ticks if no PMU
perf_counter tools: Handle kernels with !CONFIG_PERF_COUNTER
perf report: Print more expressive message in case of file open error
perf stat: Print out instructins/cycle metric
perf_counter: Clean up x86 boot messages
perf_counter tools: Standardize color printing
perf_counter tools: Clean up u64 usage
perf_counter: Better align code
perf_counter: Turn off by default

Jaswinder Singh (1):
x86: perf_counter.c intel_perfmon_event_map and max_intel_perfmon_events should be static

Jaswinder Singh Rajput (7):
x86: perf_counter remove unwanted hw_perf_enable_all
x86: irqinit_32.c fix compilation warning
x86: prepare perf_counter to add more cpus
x86: AMD Support for perf_counter
x86: decent declarations in perf_counter.c
x86: use pr_info in perf_counter.c
x86: perf_counter cleanup

Luis Henriques (1):
perf_counter: fix alignment in /proc/interrupts

Mike Galbraith (23):
perfcounters: throttle on too high IRQ rates
perfcounters: ratelimit performance counter interrupts
perfcounters fix section mismatch warning in perf_counter.c::perf_counters_lapic_init()
perfcounters: fix refcounting bug
perfcounters: fix "perf counters kill oprofile" bug
perf_counters: account NMI interrupts
perfcounters: fix use after free in perf_release()
perf_counter tools: kerneltop: add real-time data acquisition thread
perf_counter tools: kerneltop: display per function percentage along with event count
perf_counter tools: fix build error
perf_counter, x86: clean up throttling printk
perf top: fix segfault
perf top: Reduce display overhead
perf top: Remove leftover NMI/IRQ bits
perf top: fix typo in -d option
perf record: Fix the profiling of existing pid or whole box
perf_counter tools: Document '--' option parsing terminator
perf_counter tools: Fix top symbol table dump typo
perf_counter tools: Fix top symbol table max_ip typo
perf_counter tools: Guard against record damaging existing files
perf_counter tools: Make .gitignore reflect perf_counter tools files
perf_counter tools: Cleanup Makefile
perf_counter tools: Fix uninitialized variable in perf-report.c

Paul Mackerras (64):
perf_counter: Fix return value from dummy hw_perf_counter_init
perf_counter: Fix the cpu_clock software counter
perf_counter: Add optional hw_perf_group_sched_in arch function
perf_counter: Add dummy perf_counter_print_debug function
powerpc/perf_counter: Add perf_counter system call on powerpc
powerpc: Provide a way to defer perf counter work until interrupts are enabled
powerpc/perf_counter: Add generic support for POWER-family PMU hardware
powerpc/perf_counter: Add support for PPC970 family
powerpc/perf_counter: Add support for POWER6
perf_counter: Always schedule all software counters in
powerpc/perf_counter: Make sure PMU gets enabled properly
perf_counter: Add support for pinned and exclusive counter groups
perf_counter: Add counter enable/disable ioctls
perf_counters: make software counters work as per-cpu counters
perf_counters: allow users to count user, kernel and/or hypervisor events
perfcounters: fix refcounting bug, take 2
perfcounters: make context switch and migration software counters work again
perfcounters/powerpc: Make exclude_kernel bit work on Apple G5 processors
perfcounters/powerpc: Add support for POWER5 processors
perfcounters: fix a few minor cleanliness issues
perfcounters: provide expansion room in the ABI
perfcounters/powerpc: fix oops with multiple counters in a group
perfcounters/powerpc: add support for POWER5+ processors
perfcounters/powerpc: add support for POWER4 processors
perf_counter: abstract wakeup flag setting in core to fix powerpc build
perf_counter: powerpc: clean up perc_counter_interrupt
perf_counter: fix type/event_id layout on big-endian systems
perf_counter: add an mmap method to allow userspace to read hardware counters
perf_counter tools: remove glib dependency and fix bugs in kerneltop.c
perf_counter: update documentation
perf_counter: record time running and time enabled for each counter
perf_counter: powerpc: only reserve PMU hardware when we need it
perf_counter: make it possible for hw_perf_counter_init to return error codes
perf_counter tools: optionally scale counter values in perfstat mode
perf_counter: fix powerpc build
perf_counter: powerpc: set sample enable bit for marked instruction events
perf_counter: add MAINTAINERS entry
perf_counter: powerpc: add nmi_enter/nmi_exit calls
perf_counter: powerpc: allow use of limited-function counters
perf_counter: update copyright notice
perf_counter: Put whole group on when enabling group leader
perf_counter: don't count scheduler ticks as context switches
perf_counter: call atomic64_set for counter->count
perf_counter: call hw_perf_save_disable/restore around group_sched_in
perf_counter: powerpc: use u64 for event codes internally
perf_counter: allow arch to supply event misc flags and instruction pointer
perf_counter: powerpc: supply more precise information on counter overflow events
perf_counter: powerpc: initialize cpuhw pointer before use
perf_counter: Dynamically allocate tasks' perf_counter_context struct
perf_counter: Optimize context switch between identical inherited contexts
perf_counter: powerpc: Implement interrupt throttling
perf_counter: Fix race in attaching counters to tasks and exiting
perf_counter: Don't swap contexts containing locked mutex
perf_counter: Provide functions for locking and pinning the context for a task
perf_counter: Allow software counters to count while task is not running
perf_counter: Initialize per-cpu context earlier on cpu up
perf_counter: Fix cpu migration counter
perf_counter: Remove unused prev_state field
perf_counter: powerpc: Fix event alternative code generation on POWER5/5+
perf_counter: powerpc: Fix race causing "oops trying to read PMC0" errors
perf_counter: powerpc: Use new identifier names in powerpc-specific code
perf_counter: Fix lockup with interrupting counters
perf_counters: powerpc: Add support for POWER7 processors
perf_counter: powerpc: Implement generalized cache events for POWER processors

Pekka Enberg (1):
perf report: Add support for profiling JIT generated code

Peter Zijlstra (183):
perfcounters: IRQ and NMI support on AMD CPUs
perfcounters: IRQ and NMI support on AMD CPUs, fix
x86: perf_counter cleanup
perf_counter: x86: fix 32-bit irq_period assumption
perf_counter: use list_move_tail()
perf_counter: add comment to barrier
perf_counter: x86: use ULL postfix for 64bit constants
perf_counter: software counter event infrastructure
perf_counter: provide pagefault software events
perf_counter: provide major/minor page fault software events
perf_counter: hrtimer based sampling for software time events
perf_counter: add an event_list
perf_counter: fix hrtimer sampling
perf_counter: fix uninitialized usage of event_list
perf_counter: generic context switch event
perf_counter: fix up counter free paths
perf_counter: hook up the tracepoint events
perf_counter: revamp syscall input ABI
perf_counter: unify irq output code
perf_counter: remove the event config bitfields
perf_counter: avoid recursion
perf_counter: new output ABI - part 1
perf_counter tools: update to new syscall ABI
perf_counter tools: use mmap() output
perf_counter tools: remove glib dependency and fix bugs in kerneltop.c, fix poll()
perf_counter: fix perf_poll()
perf_counter: more elaborate write API
perf_counter: output objects
perf_counter: sanity check on the output API
perf_counter: optionally provide the pid/tid of the sampled task
perf_counter: kerneltop: mmap_pages argument
perf_counter: kerneltop: output event support
perf_counter: allow and require one-page mmap on counting counters
perf_counter: unify and fix delayed counter wakeup
perf_counter: fix update_userpage()
perf_counter: kerneltop: simplify data_head read
perf_counter: executable mmap() information
perf_counter: kerneltop: parse the mmap data stream
perf_counter: x86: proper error propagation for the x86 hw_perf_counter_init()
perf_counter: small cleanup of the output routines
perf_counter: re-arrange the perf_event_type
perf_counter tools: kerneltop: update event_types
perf_counter: provide generic callchain bits
perf_counter: x86: callchain support
perf_counter: pmc arbitration
perf_counter: move the event overflow output bits to record_type
perf_counter: per event wakeups
perf_counter: kerneltop: update to new ABI
perf_counter: add more context information
perf_counter: update mmap() counter read
perf_counter: update mmap() counter read, take 2
perf_counter: add more context information
perf_counter: SIGIO support
perf_counter: generalize pending infrastructure
perf_counter: x86: self-IPI for pending work
perf_counter: theres more to overflow than writing events
perf_counter: fix the mlock accounting
perf_counter: PERF_RECORD_TIME
perf_counter: counter overflow limit
perf_counter: comment the perf_event_type stuff
perf_counter: change event definition
perf_counter: rework context time
perf_counter: rework the task clock software counter
perf_counter: remove rq->lock usage
perf_counter: minimize context time updates
perf_counter: fix NMI race in task clock
perf_counter: provide misc bits in the event header
perf_counter: use misc field to widen type
perf_counter: kerneltop: keep up with ABI changes
perf_counter: add some comments
perf_counter: track task-comm data
perf_counter: some simple userspace profiling
perf_counter: move PERF_RECORD_TIME
perf_counter: allow for data addresses to be recorded
perf_counter: optimize mmap/comm tracking
perf_counter: sysctl for system wide perf counters
perf_counter: log full path names
perf_counter tools: fix Documentation/perf_counter build error
perf_counter: fix race in perf_output_*
perf_counter: fix nmi-watchdog interaction
perf_counter: tool: handle 0-length data files
perf_counter: documentation update
perf_counter: x86: fixup nmi_watchdog vs perf_counter boo-boo
perf_counter: uncouple data_head updates from wakeups
perf_counter: add ioctl(PERF_COUNTER_IOC_RESET)
perf_counter: provide an mlock threshold
perf_counter: fix the output lock
perf_counter: inheritable sample counters
perf_counter: tools: update the tools to support process and inherited counters
perf_counter: optimize perf_counter_task_tick()
perf_counter: rework ioctl()s
perf_counter: add PERF_RECORD_CONFIG
perf_counter: add PERF_RECORD_CPU
perf_counter: fix print debug irq disable
perf_counter: x86: More accurate counter update
perf_counter: x86: Allow unpriviliged use of NMIs
perf_counter: Fix perf_output_copy() WARN to account for overflow
perf_counter: x86: Fix up the amd NMI/INT throttle
perf_counter: Rework the perf counter disable/enable
perf_counter: x86: Robustify interrupt handling
perf_counter: remove perf_disable/enable exports
perf_counter: per user mlock gift
perf_counter: frequency based adaptive irq_period
perf top: update to use the new freq interface
perf_counter: frequency based adaptive irq_period, 32-bit fix
perf_counter: Fix inheritance cleanup code
perf_counter: Fix counter inheritance
perf_counter: Solve the rotate_ctx vs inherit race differently
perf_counter: Log irq_period changes
perf_counter: Optimize disable of time based sw counters
perf_counter: Optimize sched in/out of counters
perf_counter: Fix dynamic irq_period logging
perf_counter: Sanitize counter->mutex
perf_counter: Sanitize context locking
perf_counter: Fix userspace build
perf_counter: Simplify context cleanup
perf_counter: Change pctrl() behaviour
perf_counter: Remove perf_counter_context::nr_enabled
perf_counter: Fix perf-$cmd invokation
perf_counter: Remove unused ABI bits
perf_counter: Make pctrl() affect inherited counters too
perf_counter: Propagate inheritance failures down the fork() path
perf_counter: Fix PERF_COUNTER_CONTEXT_SWITCHES for cpu counters
perf_counter: x86: Expose INV and EDGE bits
perf_counter: x86: Remove interrupt throttle
perf_counter: Generic per counter interrupt throttle
perf report: Fix segfault on unknown symbols
perf report: Fix ELF symbol parsing
perf report: More robust error handling
perf_counter: tools: /usr/lib/debug%s.debug support
perf_counter: tools: report: Add vmlinux support
perf_counter: tools: report: Rework histogram code
perf_counter: tools: report: Dynamic sort/print bits
pref_counter: tools: report: Add --sort option
perf_counter: tools: report: Add comm sorting
pref_counter: tools: report: Add dso sorting
perf_counter tools: report: Implement header output for --sort variants
perf_counter: Fix COMM and MMAP events for cpu wide counters
perf_counter: Clean up task_ctx vs interrupts
perf_counter: Ammend cleanup in fork() fail
perf_counter: Use PID namespaces properly
perf_counter: tools: Expand the COMM,MMAP event synthesizer
perf_counter: tools: Better handle existing data files
perf_counter tools: Remove the last nmi bits
x86: Fix atomic_long_xchg() on 64bit
perf_counter: Add unique counter id
perf_counter: Rename various fields
perf_counter: Remove the last nmi/irq bits
perf_counter: x86: Emulate longer sample periods
perf_counter: Change data head from u32 to u64
perf_counter: Add ioctl for changing the sample period/frequency
perf_counter: Rename perf_counter_hw_event => perf_counter_attr
perf_counter tools: Fix up the ABI shakeup
perf report: Separate out idle threads
perf_counter: Add a comm hook for pure fork()s
perf record: Use long arg for counter period
perf report: Fix comm sorting
perf_counter: Fix race in counter initialization
perf report: Simplify symbol output
perf report: Add consistent spacing rules
perf_counter: Add fork event
perf_counter: Remove munmap stuff
perf_counter tools: Use fork and remove munmap events
x86: Set context.vdso before installing the mapping
perf_counter: Generate mmap events for install_special_mapping()
perf report: Deal with maps
perf_counter: Change PERF_SAMPLE_CONFIG into PERF_SAMPLE_ID
perf_counter: Add PERF_SAMPLE_PERIOD
perf_counter: Fix frequency adjustment for < HZ
perf_counter: Add mmap event hooks to mprotect()
perf_counter: More aggressive frequency adjustment
perf_counter tools: Small frequency related fixes
perf_counter tools: Propagate signals properly
perf_counter: Annotate exit ctx recursion
perf_counter tools: Normalize data using per sample period data
perf_counter: Introduce struct for sample data
perf_counter: Accurate period data
perf_counter: More paranoia settings
perf_counter: Rename perf_counter_limit sysctl
perf_counter: Rename enums
perf_counter: Standardize event names
perf_counter: Rename L2 to LL cache
perf_counter: Add counter->id to the throttle event

Robert Richter (30):
perf_counter, x86: remove X86_FEATURE_ARCH_PERFMON flag for AMD cpus
perf_counter, x86: declare perf_max_counters only for CONFIG_PERF_COUNTERS
perf_counter, x86: add default path to cpu detection
perf_counter, x86: rework pmc_amd_save_disable_all() and pmc_amd_restore_all()
perf_counter, x86: protect per-cpu variables with compile barriers only
perfcounters: rename struct hw_perf_counter_ops into struct pmu
perf_counter, x86: rename struct pmc_x86_ops into struct x86_pmu
perf_counter, x86: make interrupt handler model specific
perf_counter, x86: remove get_status() from struct x86_pmu
perf_counter, x86: remove ack_status() from struct x86_pmu
perf_counter, x86: rename __hw_perf_counter_set_period into x86_perf_counter_set_period
perf_counter, x86: rename intel only functions
perf_counter, x86: modify initialization of struct x86_pmu
perf_counter, x86: make x86_pmu data a static struct
perf_counter, x86: move counter parameters to struct x86_pmu
perf_counter, x86: make pmu version generic
perf_counter, x86: make x86_pmu_read() static inline
perf_counter, x86: rename cpuc->active_mask
perf_counter, x86: generic use of cpuc->active
perf_counter, x86: consistent use of type int for counter index
perf_counter, x86: rework counter enable functions
perf_counter, x86: rework counter disable functions
perf_counter, x86: change and remove pmu initialization checks
perf_counter, x86: implement the interrupt handler for AMD cpus
perf_counter, x86: return raw count with x86_perf_counter_update()
perf_counter, x86: introduce max_period variable
perf_counter, x86: remove vendor check in fixed_mode_idx()
perf_counter, x86: remove unused function argument in intel_pmu_get_status()
perf_counter: update 'perf top' documentation
perf_counter, x86: rename bitmasks to ->used_mask and ->active_mask

Steven Whitehouse (1):
perfcounters: export perf_tpcounter_event

Thomas Gleixner (15):
performance counters: core code
perf counters: protect them against CSTATE transitions
perf counters: clean up 'raw' type API
perf counters: expand use of counter->event
perf_counter tools: remove build generated files
perfcounter tools: move common defines ... to local header file
perfcounter tools: make rdclock an inline function
perfcounter tools: fix pointer mismatch
perfcounter tools: get the syscall number from arch/*/include/asm/unistd.h
perf_counter tools: Add 'perf list' to list available events
perf_counter tools: Add help for perf list
perf_counter, x86: Implement generalized cache event types, add Core2 support
perf_counter, x86: Implement generalized cache event types, add Atom support
perf_counter, x86: Implement generalized cache event types, add AMD support
perf_counter, x86: Clean up hw_cache_event ids copies

Tim Blechmann (1):
perf_counter: include missing header

Wu Fengguang (9):
perf_counter tools: Merge common code into perfcounters.h
perf_counter tools: Move perfstat supporting code into perfcounters.h
perf_counter tools: support symbolic event names in kerneltop
perf_counter tools: Reuse event_name() in kerneltop
perf_counter tools: move remaining code into kerneltop.c
perf_counter tools: fix comment for sym_weight()
perf_counter tools: fix event_id type
perf_counter tools: cut down default count for cpu-cycles
perf_counter tools: when no command is feed to perfstat, display help and exit

Yinghai Lu (2):
perf_counter: more barrier in blank weak function
x86: make irqinit_32.c more like irqinit_64.c, v2

Yong Wang (6):
perf_counter/x86: Always use NMI for performance-monitoring interrupt
perf_counter/x86: Remove the IRQ (non-NMI) handling bits
perf_counter: Documentation update
perf_counter tools: Fix incorrect printf formats
perf_counter, x86: Correct some event and umask values for Intel processors
perf_counter/x86: Fix the model number of Intel Core2 processors


MAINTAINERS | 10 +
arch/powerpc/include/asm/hw_irq.h | 39 +
arch/powerpc/include/asm/paca.h | 1 +
arch/powerpc/include/asm/perf_counter.h | 98 +
arch/powerpc/include/asm/reg.h | 2 +
arch/powerpc/include/asm/systbl.h | 2 +-
arch/powerpc/include/asm/unistd.h | 1 +
arch/powerpc/kernel/Makefile | 3 +
arch/powerpc/kernel/asm-offsets.c | 1 +
arch/powerpc/kernel/entry_64.S | 9 +
arch/powerpc/kernel/irq.c | 5 +
arch/powerpc/kernel/perf_counter.c | 1263 ++++++
arch/powerpc/kernel/power4-pmu.c | 598 +++
arch/powerpc/kernel/power5+-pmu.c | 671 ++++
arch/powerpc/kernel/power5-pmu.c | 611 +++
arch/powerpc/kernel/power6-pmu.c | 532 +++
arch/powerpc/kernel/power7-pmu.c | 357 ++
arch/powerpc/kernel/ppc970-pmu.c | 482 +++
arch/powerpc/mm/fault.c | 10 +-
arch/powerpc/platforms/Kconfig.cputype | 1 +
arch/x86/Kconfig | 1 +
arch/x86/ia32/ia32entry.S | 3 +-
arch/x86/include/asm/atomic_32.h | 236 ++
arch/x86/include/asm/entry_arch.h | 2 +-
arch/x86/include/asm/hardirq.h | 2 +
arch/x86/include/asm/hw_irq.h | 2 +
arch/x86/include/asm/intel_arch_perfmon.h | 31 -
arch/x86/include/asm/irq_vectors.h | 8 +-
arch/x86/include/asm/perf_counter.h | 100 +
arch/x86/include/asm/unistd_32.h | 1 +
arch/x86/include/asm/unistd_64.h | 3 +-
arch/x86/kernel/apic/apic.c | 3 +
arch/x86/kernel/cpu/Makefile | 12 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/perf_counter.c | 1704 ++++++++
arch/x86/kernel/cpu/perfctr-watchdog.c | 4 +-
arch/x86/kernel/entry_64.S | 5 +
arch/x86/kernel/irq.c | 10 +
arch/x86/kernel/irqinit.c | 15 +-
arch/x86/kernel/signal.c | 1 -
arch/x86/kernel/syscall_table_32.S | 1 +
arch/x86/kernel/traps.c | 12 +-
arch/x86/mm/fault.c | 12 +-
arch/x86/oprofile/nmi_int.c | 7 +-
arch/x86/oprofile/op_model_ppro.c | 10 +-
arch/x86/vdso/vdso32-setup.c | 6 +-
arch/x86/vdso/vma.c | 7 +-
drivers/char/sysrq.c | 2 +
fs/exec.c | 9 +
include/asm-generic/atomic.h | 2 +-
include/linux/init_task.h | 10 +
include/linux/kernel_stat.h | 5 +
include/linux/perf_counter.h | 697 ++++
include/linux/prctl.h | 3 +
include/linux/sched.h | 21 +-
include/linux/syscalls.h | 5 +
init/Kconfig | 34 +
kernel/Makefile | 1 +
kernel/exit.c | 16 +-
kernel/fork.c | 12 +
kernel/mutex.c | 2 +-
kernel/perf_counter.c | 4260 +++++++++++++++++++++
kernel/sched.c | 57 +-
kernel/sys.c | 7 +
kernel/sys_ni.c | 3 +
kernel/sysctl.c | 27 +
kernel/timer.c | 3 +
mm/mmap.c | 5 +
mm/mprotect.c | 2 +
tools/perf/.gitignore | 16 +
tools/perf/Documentation/Makefile | 300 ++
tools/perf/Documentation/asciidoc.conf | 91 +
tools/perf/Documentation/manpage-1.72.xsl | 14 +
tools/perf/Documentation/manpage-base.xsl | 35 +
tools/perf/Documentation/manpage-bold-literal.xsl | 17 +
tools/perf/Documentation/manpage-normal.xsl | 13 +
tools/perf/Documentation/manpage-suppress-sp.xsl | 21 +
tools/perf/Documentation/perf-annotate.txt | 29 +
tools/perf/Documentation/perf-help.txt | 38 +
tools/perf/Documentation/perf-list.txt | 25 +
tools/perf/Documentation/perf-record.txt | 42 +
tools/perf/Documentation/perf-report.txt | 26 +
tools/perf/Documentation/perf-stat.txt | 66 +
tools/perf/Documentation/perf-top.txt | 39 +
tools/perf/Documentation/perf.txt | 24 +
tools/perf/Makefile | 929 +++++
tools/perf/builtin-annotate.c | 1356 +++++++
tools/perf/builtin-help.c | 461 +++
tools/perf/builtin-list.c | 20 +
tools/perf/builtin-record.c | 582 +++
tools/perf/builtin-report.c | 1316 +++++++
tools/perf/builtin-stat.c | 367 ++
tools/perf/builtin-top.c | 736 ++++
tools/perf/builtin.h | 26 +
tools/perf/command-list.txt | 10 +
tools/perf/design.txt | 442 +++
tools/perf/perf.c | 428 +++
tools/perf/perf.h | 67 +
tools/perf/util/PERF-VERSION-GEN | 42 +
tools/perf/util/abspath.c | 117 +
tools/perf/util/alias.c | 77 +
tools/perf/util/cache.h | 119 +
tools/perf/util/color.c | 241 ++
tools/perf/util/color.h | 36 +
tools/perf/util/config.c | 873 +++++
tools/perf/util/ctype.c | 26 +
tools/perf/util/environment.c | 9 +
tools/perf/util/exec_cmd.c | 165 +
tools/perf/util/exec_cmd.h | 13 +
tools/perf/util/generate-cmdlist.sh | 24 +
tools/perf/util/help.c | 367 ++
tools/perf/util/help.h | 29 +
tools/perf/util/levenshtein.c | 84 +
tools/perf/util/levenshtein.h | 8 +
tools/perf/util/list.h | 603 +++
tools/perf/util/pager.c | 99 +
tools/perf/util/parse-events.c | 316 ++
tools/perf/util/parse-events.h | 17 +
tools/perf/util/parse-options.c | 508 +++
tools/perf/util/parse-options.h | 174 +
tools/perf/util/path.c | 353 ++
tools/perf/util/quote.c | 481 +++
tools/perf/util/quote.h | 68 +
tools/perf/util/rbtree.c | 383 ++
tools/perf/util/rbtree.h | 171 +
tools/perf/util/run-command.c | 395 ++
tools/perf/util/run-command.h | 93 +
tools/perf/util/sigchain.c | 52 +
tools/perf/util/sigchain.h | 11 +
tools/perf/util/strbuf.c | 359 ++
tools/perf/util/strbuf.h | 137 +
tools/perf/util/string.c | 34 +
tools/perf/util/string.h | 8 +
tools/perf/util/symbol.c | 641 ++++
tools/perf/util/symbol.h | 47 +
tools/perf/util/usage.c | 80 +
tools/perf/util/util.h | 410 ++
tools/perf/util/wrapper.c | 206 +
138 files changed, 27406 insertions(+), 85 deletions(-)
create mode 100644 arch/powerpc/include/asm/perf_counter.h
create mode 100644 arch/powerpc/kernel/perf_counter.c
create mode 100644 arch/powerpc/kernel/power4-pmu.c
create mode 100644 arch/powerpc/kernel/power5+-pmu.c
create mode 100644 arch/powerpc/kernel/power5-pmu.c
create mode 100644 arch/powerpc/kernel/power6-pmu.c
create mode 100644 arch/powerpc/kernel/power7-pmu.c
create mode 100644 arch/powerpc/kernel/ppc970-pmu.c
delete mode 100644 arch/x86/include/asm/intel_arch_perfmon.h
create mode 100644 arch/x86/include/asm/perf_counter.h
create mode 100644 arch/x86/kernel/cpu/perf_counter.c
create mode 100644 include/linux/perf_counter.h
create mode 100644 kernel/perf_counter.c
create mode 100644 tools/perf/.gitignore
create mode 100644 tools/perf/Documentation/Makefile
create mode 100644 tools/perf/Documentation/asciidoc.conf
create mode 100644 tools/perf/Documentation/manpage-1.72.xsl
create mode 100644 tools/perf/Documentation/manpage-base.xsl
create mode 100644 tools/perf/Documentation/manpage-bold-literal.xsl
create mode 100644 tools/perf/Documentation/manpage-normal.xsl
create mode 100644 tools/perf/Documentation/manpage-suppress-sp.xsl
create mode 100644 tools/perf/Documentation/perf-annotate.txt
create mode 100644 tools/perf/Documentation/perf-help.txt
create mode 100644 tools/perf/Documentation/perf-list.txt
create mode 100644 tools/perf/Documentation/perf-record.txt
create mode 100644 tools/perf/Documentation/perf-report.txt
create mode 100644 tools/perf/Documentation/perf-stat.txt
create mode 100644 tools/perf/Documentation/perf-top.txt
create mode 100644 tools/perf/Documentation/perf.txt
create mode 100644 tools/perf/Makefile
create mode 100644 tools/perf/builtin-annotate.c
create mode 100644 tools/perf/builtin-help.c
create mode 100644 tools/perf/builtin-list.c
create mode 100644 tools/perf/builtin-record.c
create mode 100644 tools/perf/builtin-report.c
create mode 100644 tools/perf/builtin-stat.c
create mode 100644 tools/perf/builtin-top.c
create mode 100644 tools/perf/builtin.h
create mode 100644 tools/perf/command-list.txt
create mode 100644 tools/perf/design.txt
create mode 100644 tools/perf/perf.c
create mode 100644 tools/perf/perf.h
create mode 100755 tools/perf/util/PERF-VERSION-GEN
create mode 100644 tools/perf/util/abspath.c
create mode 100644 tools/perf/util/alias.c
create mode 100644 tools/perf/util/cache.h
create mode 100644 tools/perf/util/color.c
create mode 100644 tools/perf/util/color.h
create mode 100644 tools/perf/util/config.c
create mode 100644 tools/perf/util/ctype.c
create mode 100644 tools/perf/util/environment.c
create mode 100644 tools/perf/util/exec_cmd.c
create mode 100644 tools/perf/util/exec_cmd.h
create mode 100755 tools/perf/util/generate-cmdlist.sh
create mode 100644 tools/perf/util/help.c
create mode 100644 tools/perf/util/help.h
create mode 100644 tools/perf/util/levenshtein.c
create mode 100644 tools/perf/util/levenshtein.h
create mode 100644 tools/perf/util/list.h
create mode 100644 tools/perf/util/pager.c
create mode 100644 tools/perf/util/parse-events.c
create mode 100644 tools/perf/util/parse-events.h
create mode 100644 tools/perf/util/parse-options.c
create mode 100644 tools/perf/util/parse-options.h
create mode 100644 tools/perf/util/path.c
create mode 100644 tools/perf/util/quote.c
create mode 100644 tools/perf/util/quote.h
create mode 100644 tools/perf/util/rbtree.c
create mode 100644 tools/perf/util/rbtree.h
create mode 100644 tools/perf/util/run-command.c
create mode 100644 tools/perf/util/run-command.h
create mode 100644 tools/perf/util/sigchain.c
create mode 100644 tools/perf/util/sigchain.h
create mode 100644 tools/perf/util/strbuf.c
create mode 100644 tools/perf/util/strbuf.h
create mode 100644 tools/perf/util/string.c
create mode 100644 tools/perf/util/string.h
create mode 100644 tools/perf/util/symbol.c
create mode 100644 tools/perf/util/symbol.h
create mode 100644 tools/perf/util/usage.c
create mode 100644 tools/perf/util/util.h
create mode 100644 tools/perf/util/wrapper.c

[ combo patch left out due to lkml size limits ]


2009-06-11 16:17:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 06:03:29PM +0200, Ingo Molnar wrote:
> Linus,
>
> Please consider pulling the performance counters Git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git perfcounters-for-linus

Err, no. This adds tons of userspace code into tools/ which
should not be in the kernel tree but a proper package.

2009-06-11 16:27:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Christoph Hellwig wrote:
>
> Err, no. This adds tons of userspace code into tools/ which
> should not be in the kernel tree but a proper package.

I disagree.

We've had tons of cases where we tried to "separate" the user-land code
and the kernel code, in the name of "beauty" of whatever.

It's almost invariably a disaster.

Look at oprofile. F*ck me, what a horrid piece of crap. It took literally
months for the user mode tools to catch up and get the patches to support
new functionality into CVS (or is it SVN?), and after that it took even
longer for them to become part of a release and be picked up by
distributions. In fact, I'm not sure it is part of a release even now - I
had to make a bug report to Fedora to get atom and Nehalem support in my
tools: I think they took the unofficial patch.

Or look at the crazy things we used to do for X. It's going away (slowly),
because some of the most incestuous things are actually just being
integrated into the kernel, and so there's less of the "two broken pieces"
approach, and more of a "one working piece" kind of thing.

So I'd much rather have kernel tools with the kernel, than have to depend
on some external entity that doesn't really care.

Linus

2009-06-11 16:35:12

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Linus,

> > Err, no. This adds tons of userspace code into tools/ which
> > should not be in the kernel tree but a proper package.
>
> I disagree.
>
> We've had tons of cases where we tried to "separate" the user-land code
> and the kernel code, in the name of "beauty" of whatever.
>
> It's almost invariably a disaster.
>
> Look at oprofile. F*ck me, what a horrid piece of crap. It took literally
> months for the user mode tools to catch up and get the patches to support
> new functionality into CVS (or is it SVN?), and after that it took even
> longer for them to become part of a release and be picked up by
> distributions. In fact, I'm not sure it is part of a release even now - I
> had to make a bug report to Fedora to get atom and Nehalem support in my
> tools: I think they took the unofficial patch.
>
> Or look at the crazy things we used to do for X. It's going away (slowly),
> because some of the most incestuous things are actually just being
> integrated into the kernel, and so there's less of the "two broken pieces"
> approach, and more of a "one working piece" kind of thing.
>
> So I'd much rather have kernel tools with the kernel, than have to depend
> on some external entity that doesn't really care.

so do you expect us to merge stuff like ip, iw, rfkill, crda, the WiMAX
tools, the Bluetooth ones and whatever we have that are all have the
same issues to be merged into the kernel source code as well.

I see no reason this can't be maintained properly outside the kernel
source. You will always have bad sheeps and screw-ups, but just putting
everything into one single location is not a good idea either. Other
subsystems do this well and so could Ingo.

Also please consider the distro point of view. All these distros have
already a hard time to keep up with the kernel patches etc. It is a lot
easier to update a userspace package then having to provide a patches
kernel source.

Regards

Marcel

2009-06-11 16:39:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Marcel Holtmann wrote:
>
> so do you expect us to merge stuff like ip, iw, rfkill, crda, the WiMAX
> tools, the Bluetooth ones and whatever we have that are all have the
> same issues to be merged into the kernel source code as well.

No. Only stuff that I expect to be really close to hardware, and used for
kernel purposes.

> Also please consider the distro point of view. All these distros have
> already a hard time to keep up with the kernel patches etc. It is a lot
> easier to update a userspace package then having to provide a patches
> kernel source.

Feel free to split it all up if it turns out to be stable later.

But I refuse to go through another oprofile.

Linus

2009-06-11 16:47:18

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Linus,

> > so do you expect us to merge stuff like ip, iw, rfkill, crda, the WiMAX
> > tools, the Bluetooth ones and whatever we have that are all have the
> > same issues to be merged into the kernel source code as well.
>
> No. Only stuff that I expect to be really close to hardware, and used for
> kernel purposes.

and where exactly do we draw the line? It is just no clear to me.

> > Also please consider the distro point of view. All these distros have
> > already a hard time to keep up with the kernel patches etc. It is a lot
> > easier to update a userspace package then having to provide a patches
> > kernel source.
>
> Feel free to split it all up if it turns out to be stable later.
>
> But I refuse to go through another oprofile.

Point taken on why you wanna do it. No questions asked here. However I
still think it is a bad idea to begin with. The perf tool could very
well has its own repository on git.kernel.org and be maintained side by
side with the kernel.

Regards

Marcel

2009-06-11 16:47:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 09:38:32AM -0700, Linus Torvalds wrote:
> > so do you expect us to merge stuff like ip, iw, rfkill, crda, the WiMAX
> > tools, the Bluetooth ones and whatever we have that are all have the
> > same issues to be merged into the kernel source code as well.
>
> No. Only stuff that I expect to be really close to hardware, and used for
> kernel purposes.

Did you take a look a tools/perf/? There is nothing close to hardware
at all. It's all pretty highly abstracted away from anything resembling
the hardware through the perfcounters interface.

2009-06-11 16:52:42

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 09:26:55AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 11 Jun 2009, Christoph Hellwig wrote:
> >
> > Err, no. This adds tons of userspace code into tools/ which
> > should not be in the kernel tree but a proper package.
>
> I disagree.
>
> We've had tons of cases where we tried to "separate" the user-land code
> and the kernel code, in the name of "beauty" of whatever.
>
> It's almost invariably a disaster.
>
> Look at oprofile. F*ck me, what a horrid piece of crap.

Yes. So's sysfs, so's udev, so's hal, so's any number of revolting
strings of intertwined copulating tapeworms hanging off the kernel's arse.

Do you consider "put into tools/" as permission to change interface at will?
More to the point, do the authors of that stuff consider it as such?

2009-06-11 16:55:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Christoph Hellwig wrote:
>
> Did you take a look a tools/perf/? There is nothing close to hardware
> at all. It's all pretty highly abstracted away from anything resembling
> the hardware through the perfcounters interface.

The thing is, the raw perfcounters interface isn't going to be useful as
is. And I have seen where things go when you split them up. So when I get
the choice, I'll go down the road of unproven failure, in the hope that it
will be successful, rather than doing the same mistake once more.

"Insanity: doing the same thing over and over, expecting to get
different results."

And I'm not insane.

Anyway, feel free to disagree. I just don't care.

Linus

2009-06-11 16:56:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, 2009-06-11 at 17:52 +0100, Al Viro wrote:

> Do you consider "put into tools/" as permission to change interface at will?
> More to the point, do the authors of that stuff consider it as such?

No, once a kernel with this syscall gets released we most certainly
intend to maintain its ABI.

2009-06-11 16:59:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Al Viro wrote:
>
> Yes. So's sysfs, so's udev, so's hal, so's any number of revolting
> strings of intertwined copulating tapeworms hanging off the kernel's arse.

Those are about a different thing, though - they are largely about policy.
Very different from something like profiling (or graphics acceleration).

I do like your visuals, though.

Linus

2009-06-11 17:01:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 06:56:18PM +0200, Peter Zijlstra wrote:
> No, once a kernel with this syscall gets released we most certainly
> intend to maintain its ABI.

So what point is there in keeping it in-tree except making life hell for
packagers?

2009-06-11 17:05:29

by Ray Lee

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 10:00 AM, Christoph Hellwig<[email protected]> wrote:
> On Thu, Jun 11, 2009 at 06:56:18PM +0200, Peter Zijlstra wrote:
>> No, once a kernel with this syscall gets released we most certainly
>> intend to maintain its ABI.
>
> So what point is there in keeping it in-tree except making life hell for
> packagers?

Packagers are quite used to taking a single source tree and building
multiple packages out of it. This isn't rocket science. It's the
multiple separate trees that need to be released in lock-step that are
headaches.

2009-06-11 17:08:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Christoph Hellwig wrote:
>
> So what point is there in keeping it in-tree except making life hell for
> packagers?

Give it up. Packagers can trivially generate their own sub-packages. They
do it all the time. They already do it for the user-mode header files,
extracted from the kernel - something you've worked on yourself.

So your point is clearly bogus, and dishonest.

You haven't actually looked the real problem in the eye, and acknowledged
the disaster that is oprofile. Let's give a _new_ approach a chance, and
see if we can avoid the mistakes of yesteryear this time.

Linus

2009-06-11 17:08:22

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Ray,

> >> No, once a kernel with this syscall gets released we most certainly
> >> intend to maintain its ABI.
> >
> > So what point is there in keeping it in-tree except making life hell for
> > packagers?
>
> Packagers are quite used to taking a single source tree and building
> multiple packages out of it. This isn't rocket science. It's the
> multiple separate trees that need to be released in lock-step that are
> headaches.

with the kernel as source package it is a headache and really painful.
All distros struggle already enough with kernel updates.

Regards

Marcel

2009-06-11 17:13:27

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 10:05:02AM -0700, Ray Lee wrote:
> On Thu, Jun 11, 2009 at 10:00 AM, Christoph Hellwig<[email protected]> wrote:
> > On Thu, Jun 11, 2009 at 06:56:18PM +0200, Peter Zijlstra wrote:
> >> No, once a kernel with this syscall gets released we most certainly
> >> intend to maintain its ABI.
> >
> > So what point is there in keeping it in-tree except making life hell for
> > packagers?
>
> Packagers are quite used to taking a single source tree and building
> multiple packages out of it. This isn't rocket science. It's the
> multiple separate trees that need to be released in lock-step that are
> headaches.

Wrong. Remember the fun bisecting around sysfs/udev incompatible change?
Oops, went back past the cutoff line, got to downgrade udev for the next
boot. Oh, it oopses? Too fucking bad, can't just boot the previous kernel,
should've kept _two_ working ones so that with any userland state we could
come back to working system.

This isn't a rocket science, this is a goddamn load of horse manure.
Packages that need to be updated in the lock-step *are* headaches from
hell when you are trying to do development. Even if you have all of
them already built.

2009-06-11 17:22:50

by Ray Lee

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 10:12 AM, Al Viro<[email protected]> wrote:
> On Thu, Jun 11, 2009 at 10:05:02AM -0700, Ray Lee wrote:
>> Packagers are quite used to taking a single source tree and building
>> multiple packages out of it. This isn't rocket science. It's the
>> multiple separate trees that need to be released in lock-step that are
>> headaches.
>
> Wrong.  Remember the fun bisecting around sysfs/udev incompatible change?
> Oops, went back past the cutoff line, got to downgrade udev for the next
> boot.  Oh, it oopses?  Too fucking bad, can't just boot the previous kernel,
> should've kept _two_ working ones so that with any userland state we could
> come back to working system.
>
> This isn't a rocket science, this is a goddamn load of horse manure.
> Packages that need to be updated in the lock-step *are* headaches from
> hell when you are trying to do development.  Even if you have all of
> them already built.

Well, welcome to our new world order of Xorg and udev and hal. I have
had to deal with bisecting the problem just as you have, and dealt
with the fallout.

The choices are:

- Don't bisect, throw up your hands and hope someone else deals with it

- keep the old versions around for installs, as you point out (I do
this regularly)

- build all the packages every time

The last one is the most reasonable and I'd argue it's the right thing
to do. But it's tricky with multiple source trees -- which version of
udev works with this kernel again? A single source tree for packages
that are kept in lock-step, as so many seem to be, makes that a hell
of a lot easier on me.

But perhaps I'm an odd-ball.

I think your complaint is "Why the hell can't they have a stable ABI?"
Probably for the same reason anything so close to the hardware hasn't
had a stable ABI. I'm sure udev/hal/Xorg will have a stable
kernel-userland interface any day now. Once they do, I'm sure
everything else that touches the hardware so intimately will have a
stable ABI too.

Sheesh.

2009-06-11 17:59:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, 11 Jun 2009, Christoph Hellwig wrote:
>> So what point is there in keeping it in-tree except making life hell for
>> packagers?

On Thu, Jun 11, 2009 at 8:06 PM, Linus
Torvalds<[email protected]> wrote:
> Give it up. Packagers can trivially generate their own sub-packages. They
> do it all the time. They already do it for the user-mode header files,
> extracted from the kernel - something you've worked on yourself.
>
> So your point is clearly bogus, and dishonest.
>
> You haven't actually looked the real problem in the eye, and acknowledged
> the disaster that is oprofile. Let's give a _new_ approach a chance, and
> see if we can avoid the mistakes of yesteryear this time.

Yup, I wonder what all the fuzz is about. We already have userspace
tools in the kernel but people keep putting them under Documentation
(to avoid this discussion, probably).

For those who think an external repository is a good idea, I invite
you to compare the success of kmemtrace (kernel memory profiler) and
perf. The former has its userspace part out-of-tree and has gained
zero new developers. Sure, there are probably fewer people interested
in memory profiling and I or Eduard surely don't have the sex appeal
of Ingo Molnar (yet anyway). But even if you take these factors into
account, I'd argue that big part of the success has been the fact that
it's easily accessible and hackable. And that pretty much means that
the code needs to sit in the kernel tree, following kernel development
process.

And really, what do we gain by moving perf out of tree and making it
follow its own release cycle (and getting out of sync eventually)?

Pekka

2009-06-11 18:04:57

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Linus Torvalds wrote:
> On Thu, 11 Jun 2009, Marcel Holtmann wrote:
>
>> so do you expect us to merge stuff like ip, iw, rfkill, crda, the WiMAX
>> tools, the Bluetooth ones and whatever we have that are all have the
>> same issues to be merged into the kernel source code as well.
>>
>
> No. Only stuff that I expect to be really close to hardware, and used for
> kernel purposes.

Whilst having no opinion on the matter, I can't help noticing that Ingo
said its "main focus is on measuring/profilig user-space apps" (sic), so
I think its use is not for kernel purposes. Coupled with Christoph's
observation that the tool is nowhere close to the hardware, it appears
neither of your two criteria apply to this.

2009-06-11 18:11:10

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Pekka Enberg wrote:
> And really, what do we gain by moving perf out of tree and making it
> follow its own release cycle (and getting out of sync eventually)?

I'm sure perf will change, for example as faults are discovered in it.
Perhaps, too, the kernel side counters will change, but will the ABI?
Peter Zijlstra comment ("we most certainly intend to maintain its ABI")
implies it won't, or won't in such a way as to break user space tools.

What I'm saying is that this doesn't sound like something that needs
user-space in lock-step with kernel.

2009-06-11 18:22:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, David Newall wrote:
>
> What I'm saying is that this doesn't sound like something that needs
> user-space in lock-step with kernel.

Give it a rest.

If that's true, then in a year or two we can just split it up already.

But I note (once more) how _nobody_ has actually been able to accept the
fact that oprofile was an abject failure as it was split up. Instead, you
all dance around totally irrelevant issues.

Linus

2009-06-11 18:24:32

by Martin Bligh

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

>> So what point is there in keeping it in-tree except making life hell for
>> packagers?
>
> Give it up. Packagers can trivially generate their own sub-packages. They
> do it all the time. They already do it for the user-mode header files,
> extracted from the kernel - something you've worked on yourself.
>
> So your point is clearly bogus, and dishonest.
>
> You haven't actually looked the real problem in the eye, and acknowledged
> the disaster that is oprofile. Let's give a _new_ approach a chance, and
> see if we can avoid the mistakes of yesteryear this time.

We actually ended up coming to the same conclusion as you for some of the
internal tools we use that are tightly tied to the kernel. There is one hitch,
which is that if you boot between different kernel versions, you need multiple
userspace versions of the tools, so you may need to put them in
/lib/modules/<kernel-version> or something equivalent, not one fixed place.

M.

2009-06-11 18:35:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Martin Bligh wrote:
>
> We actually ended up coming to the same conclusion as you for some of the
> internal tools we use that are tightly tied to the kernel. There is one hitch,
> which is that if you boot between different kernel versions, you need multiple
> userspace versions of the tools, so you may need to put them in
> /lib/modules/<kernel-version> or something equivalent, not one fixed place.

So I actually think this is broken.

No tool should ever be _that_ tightly tied to a kernel. If they are, they
are broken, plain and simple.

A stable user-space ABI is still a requirement.

What the "keep it in the kernel sources" approach hopefully allows is

- taking advantage of new features in a timely manner.

NOT with some ABI breakage, but simply things like supporting a new CPU
architecture or new counters. The thing that oprofile failed at so
badly in my experience.

- Make it easier for developers, and _avoiding_ the horrible situation
where you have two different groups that don't talk well to each other
because one is a group of user-space weenies, and the other is a group
of manly kernel people, and there is no common ground.

And no, I'm not going to "guarantee" that this works well. Again, I just
know that the separation didn't work. Let's just _try_ to do it this way,
and see how it works.

But at no point will it be acceptable to have kernel version dependencies.
Install the newest version of the binaries, and it should support older
kernels too (within reason).

The "within reason" is because (a) it's new, so early on you might see
breakage, and (b) because we do tend to allow system tools to break
occasionally. Not nearly often enough to make it valid to design around
it, though.

Linus

2009-06-11 18:39:03

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Linus Torvalds wrote:
> I note (once more) how _nobody_ has actually been able to accept the
> fact that oprofile was an abject failure as it was split up. Instead, you
> all dance around totally irrelevant issues.

Not I. I'm totally comfortable with your decision, even though it seems
counter- to your own stated policy.

While I'm still unable to help noticing things, nobody seems to have
presented an argument that user-space and kernel-side will need to be
developed together. There's been an obvious assumption, with oprofile
given as that assumption's poster-boy, but why that should be the case
for tools/perf remains unclear. Probably the reasons are so obvious that
they go without saying, but as a disinterested observer, it seems to me
that in this case the two sides really are quite separate and
independent in a very real sense. Perhaps in the same sense that acct
and quota are.

2009-06-11 18:46:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, David Newall wrote:
>
> While I'm still unable to help noticing things, nobody seems to have
> presented an argument that user-space and kernel-side will need to be
> developed together.

Well, I tried to give two reasons in my reply to Martin (it's there
somewhere in cyberspace, a couple of minutes ago). Basically timeliness
(kernel features vs taking advantage of them) and co-development (there
always seems to be a huge impedance mis-match between user-level
developers and kernel developers).

Linus

2009-06-11 18:48:42

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 09:26:55AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 11 Jun 2009, Christoph Hellwig wrote:
> >
> > Err, no. This adds tons of userspace code into tools/ which
> > should not be in the kernel tree but a proper package.
>
> I disagree.
>
> We've had tons of cases where we tried to "separate" the user-land code
> and the kernel code, in the name of "beauty" of whatever.
>
> It's almost invariably a disaster.

This is cheating. I had this as a topic for the kernel summit and
was looking forward to read an interesting article about people
dancing on the table and fighting in the corners about it.
[I do not attend myself]

People say that this would be a nightmare for the packagers.
I frankly do not see what the issue is here.

We should be able to add the necessary stuff to create the few popular
package formats.
And tools like kernels may update 4 times/year with ease - so the kernel
release frequency should be a non-issue too.

Others just say "no userspace in the kernel" - and I honestly have not understood why.


Where to draw the line?
We can ask a few simple questions:
- Are the tool part of a kernel hackers toolbox?
- Are the tool maintained by kernel people?
- Are the tool updated with new features in the kernel (*)?

If the answer is yes it is a good candidate.

(*) No excuse for ABI changes..


Simple example. I needed vmstat on my embedded platfrom the other day.
Got lots of hits on google but could not find the source - and gave up as I was busy.
[Today I found it in second try - sigh.]

Sam

2009-06-11 18:51:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 11:21:46AM -0700, Linus Torvalds wrote:
> But I note (once more) how _nobody_ has actually been able to accept the
> fact that oprofile was an abject failure as it was split up. Instead, you
> all dance around totally irrelevant issues.

I don't think oprofile has been a desaster because of any kind of split,
but because the design has been a failure from day 1.

2009-06-11 19:06:20

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Linus,

> > What I'm saying is that this doesn't sound like something that needs
> > user-space in lock-step with kernel.
>
> Give it a rest.
>
> If that's true, then in a year or two we can just split it up already.
>
> But I note (once more) how _nobody_ has actually been able to accept the
> fact that oprofile was an abject failure as it was split up. Instead, you
> all dance around totally irrelevant issues.

so your whole reasoning is based on the fact that oprofile was a
horrible failure. All the other projects/subsystems that manage this
perfectly successful without breaking API/ABI abruptly and emerging
slowly when things change don't count. Stop dancing around oprofile so
much. It makes you dizzy ;)

Regards

Marcel

2009-06-11 19:07:49

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Linus Torvalds wrote:
> timeliness
> (kernel features vs taking advantage of them) and co-development (there
> always seems to be a huge impedance mis-match between user-level
> developers and kernel developers).

You seem to be saying that putting the code in kernel tree will make
user-level developers more responsive. FWIW (very little) I would have
quietly guessed the opposite result.

2009-06-11 19:24:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, David Newall wrote:
>
> You seem to be saying that putting the code in kernel tree will make
> user-level developers more responsive. FWIW (very little) I would have
> quietly guessed the opposite result.

No. I'm saying that if there's a big overlap with _kernel_ developers
(which there is), then they can maintain the tree.

Linus

2009-06-11 19:30:40

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Linus Torvalds wrote:
>
> No. I'm saying that if there's a big overlap with _kernel_ developers
> (which there is), then they can maintain the tree.

To take the oprofile example that decided it for me: the code to actually
support new processors was all done by basically kernel developers. And it
didn't hit user land for almost a year, because the user-land tools didn't
take the patch and propagate it up.

Linus

2009-06-11 19:36:18

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Linus Torvalds wrote:
> To take the oprofile example that decided it for me: the code to actually
> support new processors was all done by basically kernel developers. And it
> didn't hit user land for almost a year, because the user-land tools didn't
> take the patch and propagate it up.
>

Bad developer, Spot, you only did half the job. Not sure there's much
more one can say.

2009-06-11 19:37:40

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Sorry, Linus, for the duplicate post.

Linus Torvalds wrote:

> > I'm saying that if there's a big overlap with _kernel_ developers
> > (which there is), then they can maintain the tree.
>

Are you suggesting that maintaining it as a separate application is
harder for them? As previously observed, the user-space stuff is quite
divorced from the hardware; and it's intended audience is user-space
developers and administrators. Did I miss something? (Obviously I didn't
miss that your decision has already been made.)

2009-06-11 19:50:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, David Newall wrote:

> Linus Torvalds wrote:
> > To take the oprofile example that decided it for me: the code to actually
> > support new processors was all done by basically kernel developers. And it
> > didn't hit user land for almost a year, because the user-land tools didn't
> > take the patch and propagate it up.
>
> Bad developer, Spot, you only did half the job. Not sure there's much
> more one can say.

Umm. The kernel developer _did_ do the job. The patch to the user land
side was available for that whole year. It just didn't get merged, and
then didn't get merged some more, and then got merged but only in a SVN
tree, not a release, and then finally when I did a bugzilla request to
fedora, they took the patch and put it in their distro.

Anyway, it's clearly not worth discussing this with you. I've tried. I
give up. Happily, I don't _need_ to convince you.

Linus

2009-06-11 19:59:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, 11 Jun 2009 10:06:55 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

> You haven't actually looked the real problem in the eye, and acknowledged
> the disaster that is oprofile. Let's give a _new_ approach a chance, and
> see if we can avoid the mistakes of yesteryear this time.

+1, metoo.

We've had numerous problems in the past where kernel developers have
shied away from altering or distributing userspace code. One effect of
this which we see again and again is that people shove presentation and
parsing code into the kernel which should have been in userspace.

It could be that shipping userspace code in the kernel bundle will
improve that situation. So let's give it a try. If it turns out to be
good, let's do it again. If it turns out to be bad, let's move perf
out of the kernel tree and not do it again.

2009-06-11 20:10:36

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Andrew Morton wrote:
>
> It could be that shipping userspace code in the kernel bundle will
> improve that situation. So let's give it a try. If it turns out to be
> good, let's do it again. If it turns out to be bad, let's move perf
> out of the kernel tree and not do it again.

Exactly. Right now, I use the oprofile experience as a reason for why we
should try to do this. But hey, who knows, in one year, maybe people will
use _this_ experience as a reason why we should never do it again.

We just don't know yet. But that's no reason not to try. Either way, we'll
hopefully learn something.

Or to quote Edison:
"I have not failed 700 times. I have not failed once. I have succeeded in
proving that those 700 ways will not work. When I have eliminated the
ways that will not work, I will find the way that will work."

Linus

2009-06-11 20:24:17

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux


* Linus Torvalds <[email protected]> wrote:

[...]
> What the "keep it in the kernel sources" approach hopefully allows is
>
> - taking advantage of new features in a timely manner.
>
> NOT with some ABI breakage, but simply things like supporting a
> new CPU architecture or new counters. The thing that oprofile
> failed at so badly in my experience.
>
> - Make it easier for developers, and _avoiding_ the horrible
> situation where you have two different groups that don't talk
> well to each other because one is a group of user-space
> weenies, and the other is a group of manly kernel people, and
> there is no common ground.

Yes, very much agreed.

Btw., here are a couple of other arguments why i find it useful to
have the tools/perf/ in the kernel repo:

1) Super-fast and synchronized release cycles

The kernel is one of the fastest moving packages in Linux - most
user-space packages have (much!) longer release cycles than 3
months.

A tight release schedule forces a certain amount of release
discipline on tooling as well - so i'm glad that the two will be
coupled. It's so easy for a promising tool to degrade into
tinkerware with odd release cycles with time - if it's part of the
kernel then at least the release cycles wont be odd but at precise 3
months.

2) Performance _matters_

This is an argument pretty specific to perfcounters: Performance
analysis tools under Linux suck pretty summarily. Yet, one of the
major strengths of Linux is (or at least used to be) performance. So
i find it very fitting that the kernel community takes performance
analysis tooling into their own hand.

3) Strict quality control under a proven mode

In the kernel repo i can be sure that:

- No one will even think of adding autofools to tools/perf/.

- No one will send us code with Hungarian notation and two spaces
tabulation.

- No one will put getopt.h into the code

- No one will rewrite it in some weird language

[ Or at least, even though such incidents might happen
occasionally, i can just sit back in my chair and watch the
resulting showdown on lkml, without having to worry about the
outcome ;-) ]

I can point contributors to well-established kernel coding
principles, without having to argue no end about them.

All in one - the Linux kernel is a fire breathing monster engine
when it comes to producing good software. Who says it that that this
infrastructure and experience can only be used to produce kernel
space code?

4) Code reuse

We actually use code from the kernel: list.h primitives and
rbtrees.c. We privatized them for now under
tools/perf/util/rbtree.[ch] and tools/perf/util/list.h because
there's some header and type pollution in them, but it would be nice
to include them directly and share the facilities.

5) Reality check for kernel developers

I think kernel hackers need a reality check too. It's easy to say
that user-space sucks - but now there's a way and channel that
frustration via direct action and make a real difference. I do hope
that the extra superfluous mental energies visible in this thread
can be used for good purposes too ;-)

6) It's a lot of fun

I never thought i'd say that - but hacking properly structured
user-space code in the kernel repo is serious fun. It's even
relaxing at times: i can be reasonably sure that i wont crash the
kernel.

All in one, we did this because we found that it produces better
code in practice and does it faster - and i dont think we should
rigidly limit the kernel repo to kernel-space projects alone.

Ingo

2009-06-11 20:49:28

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Ingo,

> > What the "keep it in the kernel sources" approach hopefully allows is
> >
> > - taking advantage of new features in a timely manner.
> >
> > NOT with some ABI breakage, but simply things like supporting a
> > new CPU architecture or new counters. The thing that oprofile
> > failed at so badly in my experience.
> >
> > - Make it easier for developers, and _avoiding_ the horrible
> > situation where you have two different groups that don't talk
> > well to each other because one is a group of user-space
> > weenies, and the other is a group of manly kernel people, and
> > there is no common ground.
>
> Yes, very much agreed.
>
> Btw., here are a couple of other arguments why i find it useful to
> have the tools/perf/ in the kernel repo:
>
> 1) Super-fast and synchronized release cycles
>
> The kernel is one of the fastest moving packages in Linux - most
> user-space packages have (much!) longer release cycles than 3
> months.

that might be true for some projects, but for others this is wrong. You
are just making an assumption out of thin air.

> A tight release schedule forces a certain amount of release
> discipline on tooling as well - so i'm glad that the two will be
> coupled. It's so easy for a promising tool to degrade into
> tinkerware with odd release cycles with time - if it's part of the
> kernel then at least the release cycles wont be odd but at precise 3
> months.

And you can't do that within a perf.git tree on kernel.org because?

> 2) Performance _matters_
>
> This is an argument pretty specific to perfcounters: Performance
> analysis tools under Linux suck pretty summarily. Yet, one of the
> major strengths of Linux is (or at least used to be) performance. So
> i find it very fitting that the kernel community takes performance
> analysis tooling into their own hand.
>
> 3) Strict quality control under a proven mode
>
> In the kernel repo i can be sure that:
>
> - No one will even think of adding autofools to tools/perf/.

That argument is non-sense. While autoconf/automake is maybe not to your
liking, nobody forces you to use it. Projects like git, iw etc. do
perfectly fine without it. I don't mind having autoconf/automake around.

> - No one will send us code with Hungarian notation and two spaces
> tabulation.

What kind of shitty argument it is that. I enforce kernel coding style
in my userspace project all the time. No problem with that.

> - No one will put getopt.h into the code

And that is so bad because?

> - No one will rewrite it in some weird language

And they can do as they please. You don't have to accept the re-write.
These are all non-sense arguments. If you maintain a userspace project
properly then you will not see any of these problems.

> I can point contributors to well-established kernel coding
> principles, without having to argue no end about them.

Come on. A lot of projects use kernel coding style nowadays. That is not
a problem here.

> All in one - the Linux kernel is a fire breathing monster engine
> when it comes to producing good software. Who says it that that this
> infrastructure and experience can only be used to produce kernel
> space code?

And who says that all userspace people have no idea what they are doing.
We have a lot of successful project that follow almost the same rules as
the kernel.

> 4) Code reuse
>
> We actually use code from the kernel: list.h primitives and
> rbtrees.c. We privatized them for now under
> tools/perf/util/rbtree.[ch] and tools/perf/util/list.h because
> there's some header and type pollution in them, but it would be nice
> to include them directly and share the facilities.

Lets see if you are making up an argument or if you are really trying to
work this out and solve it.

> 5) Reality check for kernel developers
>
> I think kernel hackers need a reality check too. It's easy to say
> that user-space sucks - but now there's a way and channel that
> frustration via direct action and make a real difference. I do hope
> that the extra superfluous mental energies visible in this thread
> can be used for good purposes too ;-)
>
> 6) It's a lot of fun
>
> I never thought i'd say that - but hacking properly structured
> user-space code in the kernel repo is serious fun. It's even
> relaxing at times: i can be reasonably sure that i wont crash the
> kernel.
>
> All in one, we did this because we found that it produces better
> code in practice and does it faster - and i dont think we should
> rigidly limit the kernel repo to kernel-space projects alone.

Linus has a bad expierience with oprofile and wants to try something new
and I can follow that argument to a certain degree. I don't agree with
it, but that is fine.

So you are saying that only good code comes from including it into
linux-2.6.git and otherwise you will never get there. Have you actually
tried to maintain this in a separate repository on kernel.org?

Regards

Marcel

2009-06-11 21:06:19

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

>
> So you are saying that only good code comes from including it into
> linux-2.6.git and otherwise you will never get there. Have you actually
> tried to maintain this in a separate repository on kernel.org?

Could you please remind us what the arguments agains including a few
seleted tools within the kernel source tree was.

I ask because I really cannot see why so much nosie is generated?
As a naive user that like easy access to the stuff I work with
this looks like an optimal place to find the kernel-hacking
tools I need. Why should I hunt somewhere else to find it?

Sam

2009-06-11 21:14:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux


* Marcel Holtmann <[email protected]> wrote:

> Hi Ingo,
>
> > > What the "keep it in the kernel sources" approach hopefully allows is
> > >
> > > - taking advantage of new features in a timely manner.
> > >
> > > NOT with some ABI breakage, but simply things like supporting a
> > > new CPU architecture or new counters. The thing that oprofile
> > > failed at so badly in my experience.
> > >
> > > - Make it easier for developers, and _avoiding_ the horrible
> > > situation where you have two different groups that don't talk
> > > well to each other because one is a group of user-space
> > > weenies, and the other is a group of manly kernel people, and
> > > there is no common ground.
> >
> > Yes, very much agreed.
> >
> > Btw., here are a couple of other arguments why i find it useful to
> > have the tools/perf/ in the kernel repo:
> >
> > 1) Super-fast and synchronized release cycles
> >
> > The kernel is one of the fastest moving packages in Linux - most
> > user-space packages have (much!) longer release cycles than 3
> > months.
>
> that might be true for some projects, but for others this is
> wrong. You are just making an assumption out of thin air.
>
> > A tight release schedule forces a certain amount of release
> > discipline on tooling as well - so i'm glad that the two will be
> > coupled. It's so easy for a promising tool to degrade into
> > tinkerware with odd release cycles with time - if it's part of
> > the kernel then at least the release cycles wont be odd but at
> > precise 3 months.
>
> And you can't do that within a perf.git tree on kernel.org
> because?

We actually tried the tools as separate code, and for the first
three months of the project we only got three contributions - while
the kernel code was essentially finished. (Pekka reported a similar
experience in this thread, with another tool that has close kernel
ties.)

Once we moved it into the same repo as the kernel code (three months
ago), the patches started flowing in - at an amazing rate. We now
have a dozen contributors, most of them kernel developers, and we
have over a hundred good changes to the tools - in just another 3
months.

The key difference was the location of the tools. It is very
convenient and productive to have a shared repository for a project
that frequently involves both kernel and tool changes.

So my point is: this model clearly works in practice and all the
current tools/perf/ contributors like this kind of coding
environment.

Most of your arguments seem to center around the notion that it
could all be done in a separate repo too and that such a repo could
be run as well as the Linux kernel. If you think you could do it
even better in a separate repo you are certainly free to try it.

Ingo

2009-06-11 21:17:27

by Marcel Holtmann

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi Sam,

> > So you are saying that only good code comes from including it into
> > linux-2.6.git and otherwise you will never get there. Have you actually
> > tried to maintain this in a separate repository on kernel.org?
>
> Could you please remind us what the arguments agains including a few
> seleted tools within the kernel source tree was.
>
> I ask because I really cannot see why so much nosie is generated?
> As a naive user that like easy access to the stuff I work with
> this looks like an optimal place to find the kernel-hacking
> tools I need. Why should I hunt somewhere else to find it?

I personally would expect a perf.git on kernel.org for the userspace
tools for it. Like we have udev.git there, iproute2.git and others.

Seems to be working perfectly fine (except of course oprofile) and makes
packaging and security updates a lot easier. The distros have always a
really hard problem with releasing new kernel packages. And as long as
the source changes the whole set of binary packages needs to be rebuilt
and in theory if you install a new kernel, you should reboot. So if
there is an issue in perf userspace, then the current processes in most
distros will propose the user a reboot for no good reason.

There is nothing wrong with trying something new, but to be honest I
don't buy into the arguments why we do it. It seems like it is all based
on bad experience with some userspace maintainers and not really
technical grounds why it is a must to have this inside the kernel source
code. Of course you can make the argument the other way around and say
why not. And I give Linus that he wants to try. However all the
arguments from Ingo are a joke and basically tells that all userspace
developers have no clue and can't get right anyway.

Maybe it is just a sneaky attempt to get a higher hit in Greg's
statistics by just writing some userspace code which otherwise would not
be counted ;)

Regards

Marcel

2009-06-11 21:24:26

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 11:17:16PM +0200, Marcel Holtmann wrote:
> Hi Sam,
>
> > > So you are saying that only good code comes from including it into
> > > linux-2.6.git and otherwise you will never get there. Have you actually
> > > tried to maintain this in a separate repository on kernel.org?
> >
> > Could you please remind us what the arguments agains including a few
> > seleted tools within the kernel source tree was.
> >
> > I ask because I really cannot see why so much nosie is generated?
> > As a naive user that like easy access to the stuff I work with
> > this looks like an optimal place to find the kernel-hacking
> > tools I need. Why should I hunt somewhere else to find it?
>
> I personally would expect a perf.git on kernel.org for the userspace
> tools for it. Like we have udev.git there, iproute2.git and others.
>
> Seems to be working perfectly fine (except of course oprofile) and makes
> packaging and security updates a lot easier.
There is nothing preventing us from adding support for rpm and source rpms.
So you just grab the relevant tre and issue a few cammnds and you have your
packages.
And for security fixes we have the stable kernels.

> The distros have always a
> really hard problem with releasing new kernel packages.
There is nothing that say that because the code live inside
the kernel tree you _have_to_ release the full kernel source
to release a tool.

You mix up the fact that the source for the tool live inside the
kernel with the way tools are packaged.

Sam

2009-06-11 22:00:01

by Steven Rostedt

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 11:17:16PM +0200, Marcel Holtmann wrote:
> Hi Sam,
>
> > > So you are saying that only good code comes from including it into
> > > linux-2.6.git and otherwise you will never get there. Have you actually
> > > tried to maintain this in a separate repository on kernel.org?
> >
> > Could you please remind us what the arguments agains including a few
> > seleted tools within the kernel source tree was.
> >
> > I ask because I really cannot see why so much nosie is generated?
> > As a naive user that like easy access to the stuff I work with
> > this looks like an optimal place to find the kernel-hacking
> > tools I need. Why should I hunt somewhere else to find it?
>
> I personally would expect a perf.git on kernel.org for the userspace
> tools for it. Like we have udev.git there, iproute2.git and others.
>
> Seems to be working perfectly fine (except of course oprofile) and makes
> packaging and security updates a lot easier. The distros have always a
> really hard problem with releasing new kernel packages. And as long as
> the source changes the whole set of binary packages needs to be rebuilt
> and in theory if you install a new kernel, you should reboot. So if
> there is an issue in perf userspace, then the current processes in most
> distros will propose the user a reboot for no good reason.
>
> There is nothing wrong with trying something new, but to be honest I
> don't buy into the arguments why we do it. It seems like it is all based
> on bad experience with some userspace maintainers and not really
> technical grounds why it is a must to have this inside the kernel source
> code. Of course you can make the argument the other way around and say
> why not. And I give Linus that he wants to try. However all the
> arguments from Ingo are a joke and basically tells that all userspace
> developers have no clue and can't get right anyway.

Here's another point that I have not really seen anyone make. The tools that
would be packaged with the kernel are the ones that I would expect the average
kernel developer to use. Things to help us in developing better code.

The tools you mentioned

"ip, iw, rfkill, crda, the WiMAX"

I have no idea what they do. I don't think I would use them as I don't
work on bluetooth, and I don't see how they would help me with what I do
work on.

I use 'udev' only to boot my machine, and I only notice it when it doesn't
work.

As for something like perf, that is something I can see myself using to
analyze my own code. And I can see other developers (even you) using it for
the same purpose. This is a tool that I would like to have the latest version
for the latest version of the kernel I am developing on. That is, if the
latest kernel had a new feature that perf can take advantage of, it would be
nice to have it with the new kernel I just pulled.

This could also work with a perf.git, but I would probably not bother with
it if I had to keep checking the perf.git repo to see if it uses the
new features that are in the kernel. I constantly do 'git pull' for the
kernel and I would get the latest perf with the latest kernel and I
would not need to bother checking someplace else.

Actually, I can also see that if a new feature in the kernel was added that
perf uses, I would probably notice it first with compiling perf and doing
a perf --help.

>
> Maybe it is just a sneaky attempt to get a higher hit in Greg's
> statistics by just writing some userspace code which otherwise would not
> be counted ;)

No, that would be something that I do ;-)

/me plans on sending patches for perf.

-- Steve

2009-06-11 22:18:59

by Jiri Slaby

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi.

On 06/11/2009 11:26 PM, Sam Ravnborg wrote:
> On Thu, Jun 11, 2009 at 11:17:16PM +0200, Marcel Holtmann wrote:
>> I personally would expect a perf.git on kernel.org for the userspace
>> tools for it. Like we have udev.git there, iproute2.git and others.
>>
>> Seems to be working perfectly fine (except of course oprofile) and makes
>> packaging and security updates a lot easier.
> There is nothing preventing us from adding support for rpm and source rpms.
> So you just grab the relevant tre and issue a few cammnds and you have your
> packages.

Bah, having 40M .src.rpm for a 5k binary package?

Maybe I'm missing something, how exactly do you conceive the packaging?
Or do you expect packagers to download a kernel package, untar it, get
tools/ dir, tar it and package? I hope not :).

And how would we cope with a different release cycle of the userspace
tool? If one rewrites a part totally independent on the kernel, do they
need to wait for the next kernel release? Or just merge it at any time
and packagers pick it up?

> And for security fixes we have the stable kernels.

So packagers will stick with the latest stable, right? With backporting
of (only stable) new fancy features from current git until next kernel
release.

2009-06-11 22:29:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, Jiri Slaby wrote:
>
> Bah, having 40M .src.rpm for a 5k binary package?

Why do people who don't even know how packaging works bother to even
participate in the discussion?

Look at how many git binary packages there are some day. For CVS users,
for SVN people, graphical tools etc. Do you think that each of them has a
source package?

No.

You can generate multiple binary packages from the same source package
(trivial example: debug builds etc). But you want to make a point, and
then YOU USE SOME DAMN IDIOTIC AND IGNORANT argument to do so.

Not smart.

Linus

2009-06-11 22:39:51

by Alan

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

> Look at how many git binary packages there are some day. For CVS users,
> for SVN people, graphical tools etc. Do you think that each of them has a
> source package?

That misses the point, at least for the systems as the work now. Having a
single source package to multiple binaries is easy. Managing setups where
you push only some of those binaries into the system gets really ugly.

A pile of git-foo packages works because you almost never push a fix to
one tool alone, and if you do its small enough not to be a big deal.

2009-06-11 22:50:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Thu, 11 Jun 2009, Alan Cox wrote:
>
> That misses the point, at least for the systems as the work now. Having a
> single source package to multiple binaries is easy. Managing setups where
> you push only some of those binaries into the system gets really ugly.

Umm. But what's the problem?

Sure, you'd always update the 'kernel-perftool' package (or whatever you'd
call it) when you update the kernel. But so what? It's going to be tiny.
And appropriate.

IOW, where's the downside?

Linus

2009-06-11 23:20:18

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 03:27:36PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 12 Jun 2009, Jiri Slaby wrote:
> >
> > Bah, having 40M .src.rpm for a 5k binary package?
>
> Why do people who don't even know how packaging works bother to even
> participate in the discussion?
>
> Look at how many git binary packages there are some day. For CVS users,
> for SVN people, graphical tools etc. Do you think that each of them has a
> source package?
>
> No.
>
> You can generate multiple binary packages from the same source package
> (trivial example: debug builds etc). But you want to make a point, and
> then YOU USE SOME DAMN IDIOTIC AND IGNORANT argument to do so.

Linus, the real question that needs to be answered is this:

What shall be done to ABI-breaking changes when users of that ABI are
in tools/*?

_That_ is the real issue. Because I can guarantee that there will be attempts
to use that as an excuse for ABI breakage. We have one specimen in this
thread already, complete with "oh, bisect problems don't matter, just rebuild
all packages" (and install them where, exactly? if it, say, break-the-boot
kind of incompatibility, how does one recover from running into a b0rken
kernel during bisect?)

If you are willing to ban that kind of crap - great; there are real remaining
issues (mostly with choosing the dependencies between binary packages), but
that's more or less survivable. If not... we'll have one hell of a PITA
to deal with when that kind of excuse gets actually used.

2009-06-11 23:26:32

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, Al Viro wrote:
>
> Linus, the real question that needs to be answered is this:

No it's not.

People have already told you that the intent isn't to change the ABI. So
your whole "hard-hitting journalism" is just bogus posturing.

What does this have to do with anything?

Linus

2009-06-12 00:26:56

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 04:25:19PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 12 Jun 2009, Al Viro wrote:
> >
> > Linus, the real question that needs to be answered is this:
>
> No it's not.
>
> People have already told you that the intent isn't to change the ABI. So
> your whole "hard-hitting journalism" is just bogus posturing.
>
> What does this have to do with anything?

Oh, for... I can bloody well read, I've seen the reply from Peter and
I've no reasons to doubt his words (and if I had, I would've said so).
Not the issue. I don't know who you are confusing me with, but for the
record - I have no problem with this particular code being in tree.

I do have a problem with another thing: suggestions I've heard quite a few
times before; basically, "let's allow special breakable ABIs for use by
userland code living in kernel tree and tied to specific version". No,
I'm not saying that this is what's happening with that merge. But your
support for userland code in the tree (and BTW, I agree that it's a good
idea - hell, mount(8) makes a good candidate as far as I'm concerned) will
be parsed as green light for that. Has been already, in this thread.

So could you please clarify the situation? If the ABI compatibility
requirements remain the same as they used to be, whether the userland code
is in-tree or not, I'm fine with the entire thing. If they do not (and *ONLY*
in that case), I think we have a real problem.

For the record, I don't give a damn about packaging-related arguments and
theories about keeping userland source separate as a matter of some principle.
As far as I'm concerned, it's not a problem - as long as we take care of
later version's $TOOL working on older kernel as well as $TOOL from that
older kernel used to work, I'm fine with it.

I realize that multi-side flamefests are messy, but let's keep track of
who's saying what, OK?

Subject: Re: [GIT PULL] Performance Counters for Linux

On 11.06.09 12:49:25, Linus Torvalds wrote:
>
>
> On Fri, 12 Jun 2009, David Newall wrote:
>
> > Linus Torvalds wrote:
> > > To take the oprofile example that decided it for me: the code to actually
> > > support new processors was all done by basically kernel developers. And it
> > > didn't hit user land for almost a year, because the user-land tools didn't
> > > take the patch and propagate it up.
> >
> > Bad developer, Spot, you only did half the job. Not sure there's much
> > more one can say.
>
> Umm. The kernel developer _did_ do the job. The patch to the user land
> side was available for that whole year. It just didn't get merged, and
> then didn't get merged some more, and then got merged but only in a SVN
> tree, not a release, and then finally when I did a bugzilla request to
> fedora, they took the patch and put it in their distro.

Having the oprofile user land in the kernel would not solve the
problem. Then you would have code in the kernel tree you actually
don't wont there: XML encoder, autoconf scripts, graphical tools, c++
code, man page docs, etc., and maybe different coding style.

The problem is another one. First, as Christoph mentioned, it is a
design problem of oprofile. Changes in the kernel require user land
changes. This could be done better, but everybody knows it is hard to
change the user/kernel i/f and maybe, keep backward compatibility
too. So this is not easy to fix.

Second, there are different users with different expectations. (Linus,
I suggest oprofile has one user less, hmm...) Some run the latest
kernel on the latest systems, others use it in their clusters using
stable, not often changing well tested releases and hardware. If a
user land release aims more the seconds, it must conflict with the
others. Also, being in sync with the kernel would require release
cycles as for the kernel, which was the problem here with oprofile.

But, user land patches exist, even at the day of the kernel
release. Otherwise the code would have been badly or not tested. And
the patches are also in a repository, _somewhere_. This, was true for
oprofile too, the patches were in cvs at least on the day the kernel
was released. (I think a git repository would be nicer, but that's a
different question.) And this is the next problem, the patches are
somewhere, sometimes not under control of the kernel developer. And
this could be best solved if the kernel developer who brings the
kernel code upstream maintains a user land repository at
git.kernel.org. (Marcel already suggested this too.) There could be
all patches in required to run the latest kernel, based on the latest
user land release. (You can blame then the kernel maintainer, if
something does not work.) And of course the user is required then to
compile the user land himself, as he does for the kernel. And maybe,
distros pick up the patches too when adding a new kernel to it.

So, I think this would be much nicer than having a user land in the
kernel tree. And this would also solve the problems with the oprofile
user land.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center
email: [email protected]

2009-06-12 03:00:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux



On Fri, 12 Jun 2009, Al Viro wrote:
>
> So could you please clarify the situation? If the ABI compatibility
> requirements remain the same as they used to be, whether the userland code
> is in-tree or not, I'm fine with the entire thing. If they do not (and *ONLY*
> in that case), I think we have a real problem.

I think the ABI requirements are the same.

That said, I also suspect that as with oprofile itself, we'll end up
having expansions of the ABI that may well be CPU-specific. I also suspect
that there will probably be breakage early on just because things will
inevitably settle.

And I think that for something like a profiling tool, such breakage is
much more acceptable than for the actual binaries you'd profile. It's not
like we're talking about breaking the boot or functionality of a machine,
as happens when we break the X server (which has happened).

Linus

2009-06-12 03:22:17

by David Newall

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Linus Torvalds wrote:
> On Fri, 12 Jun 2009, David Newall wrote:
>> Linus Torvalds wrote:
>>
>>> To take the oprofile example that decided it for me: the code to actually
>>> support new processors was all done by basically kernel developers. And it
>>> didn't hit user land for almost a year, because the user-land tools didn't
>>> take the patch and propagate it up.
>>>
>> Bad developer, Spot, you only did half the job. Not sure there's much
>> more one can say.
>>
>
> Umm. The kernel developer _did_ do the job. The patch to the user land
> side was available for that whole year.

I don't know this oprofile problem you had, only what you've said, which
is that somebody* did the kernel bit and somebody else did the userspace
bit; and the person doing the userspace bit was unresponsive so good
stuff got ignored for a year. That situation did not occur because the
userspace was out-of-tree, it occurred because you let it. You could
have given the userspace (back) to the kernel developer. That's what
you'd eventually do if a kernel sub-system maintainer became
unresponsive, isn't it?

*the singular is intended to include the plural and the male to include
female.


> Anyway, it's clearly not worth discussing this with you. I've tried. I
> give up. Happily, I don't _need_ to convince you.

Indeed, no, you don't need to convince me, particularly as I've made it
abundantly clear that I'm entirely happy with your decision. Notice I've
not argued with you, merely pointed out inconsistencies in what you've
said. I realise that can be annoying, and acknowledge your absolute
right to be as consistent or inconsistent as you choose. I wasn't (and
still aren't) trying to be annoying, but to confirm there was no
confusion. If you hadn't seen these inconsistencies before, you surely
do now. That actually should be worth your while. I personally welcome
being corrected; andconsider that a trait of an open mind.

2009-06-12 04:06:23

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 07:58:37PM -0700, Linus Torvalds wrote:
>
>
> On Fri, 12 Jun 2009, Al Viro wrote:
> >
> > So could you please clarify the situation? If the ABI compatibility
> > requirements remain the same as they used to be, whether the userland code
> > is in-tree or not, I'm fine with the entire thing. If they do not (and *ONLY*
> > in that case), I think we have a real problem.
>
> I think the ABI requirements are the same.

OK, then.

> That said, I also suspect that as with oprofile itself, we'll end up
> having expansions of the ABI that may well be CPU-specific. I also suspect
> that there will probably be breakage early on just because things will
> inevitably settle.
>
> And I think that for something like a profiling tool, such breakage is
> much more acceptable than for the actual binaries you'd profile. It's not
> like we're talking about breaking the boot or functionality of a machine,
> as happens when we break the X server (which has happened).

Sure.

2009-06-12 04:08:13

by Kyle McMartin

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

[With my Fedora on.]

On Thu, Jun 11, 2009 at 10:06:55AM -0700, Linus Torvalds wrote:
>
>
> On Thu, 11 Jun 2009, Christoph Hellwig wrote:
> >
> > So what point is there in keeping it in-tree except making life hell for
> > packagers?
>
> Give it up. Packagers can trivially generate their own sub-packages. They
> do it all the time. They already do it for the user-mode header files,
> extracted from the kernel - something you've worked on yourself.
>
> So your point is clearly bogus, and dishonest.
>
> You haven't actually looked the real problem in the eye, and acknowledged
> the disaster that is oprofile. Let's give a _new_ approach a chance, and
> see if we can avoid the mistakes of yesteryear this time.
>

This is actually somewhat complicated for (at least, I can only speak
from experience for...) Fedora and Debian/Ubuntu. Having this in-kernel
means any bugfixes needed for the 'perf' tool, require patching the
kernel source, which will result in a whole new kernel rpm being built.
So in order to update their 'perf' tool, users will get a new kernel,
debuginfo, etc., with it.

So either we need to split it out into its own source tarball, or ship
the kernel source again in a seperate source package. I know which I'm
going to tend to favour...

Obviously, I understand the reasons for doing this, but I don't really
see it as a sensible long term option for a mature tool. But,
whatever, it's not my call. We'll just work around whatever happens.

regards, Kyle

2009-06-12 07:35:49

by Alan

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

> Sure, you'd always update the 'kernel-perftool' package (or whatever you'd
> call it) when you update the kernel. But so what? It's going to be tiny.
> And appropriate.
>
> IOW, where's the downside?

Why you need to update the perftool not the kernel, which is very likely
to be the case early on. There are ways for vendors to cope anyway - such
as by deleting the tools directory from their kernel and keeping a
separate perftool package that gets updated now and then from the kernel
tree but is otherwise a fork

Alan

2009-06-12 09:57:04

by Stephane Eranian

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi,

On Thu, Jun 11, 2009 at 6:03 PM, Ingo Molnar <[email protected]> wrote:

> The counter concept got objected to in past discussions on lkml, by
> DaveM and by Stephane Eranian (i've Cc:-ed them) - so this code was
> not eligible for linux-next testing - nevertheless we gave it good
> testing on PowerPC and x86 and i've done a wide cross-build test as
> well to try to make sure it breaks no other architecture.

I don't think you can quote me saying "I object to this code". I posted
a detailed review of the API and implementation on X86 outlining lots
of issues. Some got fixed, but many others are left unresolved at this
point. And I will post some more shortly.

I don't think that because this code is coming from you, it should be
allowed to short-circuit the established release process. You have
to respond to questions, fix issues like everybody else and if that
slows down the integration you cannot blame the reviewers for it.

2009-06-12 10:20:21

by Jörn Engel

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, 11 June 2009 17:59:41 -0400, Steven Rostedt wrote:
>
> The tools you mentioned
>
> "ip, iw, rfkill, crda, the WiMAX"
>
> I have no idea what they do.

ip I can explain to you. Ten years back when I was a netadmin I faced
the problem of implementing traffic shaping of some sorts. Details
don't matter much. After a very short while I learned that ip was the
solution to my problem. One week later I started digging into the
kernel code because I simply couldn't work out how to use this thing.
Another week later I was playing with the idea of writing my own traffic
shaper in the kernel instead. It was that bad.

Then I found something called tinybsd, a bsd distro on one floppy disk.
We allotted an old 486 with two network cards, I spent some
uncomfortable time configuring the beast with the crappy editor you can
expect on 1.44MB and the thing just worked henceforth.

Oh, the bsd had their equivalent of ip tightly coupled with their
kernel. Not sure if that caused the marked difference, but I'll gladly
add this shred of anecdotal support.

[ And in case someone takes offence or considers me an idiot for not
being able to use ip or tc, I would _love_ to see a howto explaining how
one can limit the amount of traffic on one interface to - say - 1GB per
month. ]

Jörn

--
Write programs that do one thing and do it well. Write programs to work
together. Write programs to handle text streams, because that is a
universal interface.
-- Doug MacIlroy

2009-06-12 10:28:27

by Ingo Molnar

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux


* stephane eranian <[email protected]> wrote:

> Hi,
>
> On Thu, Jun 11, 2009 at 6:03 PM, Ingo Molnar <[email protected]> wrote:
>
> > The counter concept got objected to in past discussions on lkml,
> > by DaveM and by Stephane Eranian (i've Cc:-ed them) - so this
> > code was not eligible for linux-next testing - nevertheless we
> > gave it good testing on PowerPC and x86 and i've done a wide
> > cross-build test as well to try to make sure it breaks no other
> > architecture.
>
> I don't think you can quote me saying "I object to this code".
> [...]

I never saw you retract/change this negative opinion of yours about
the whole separate-counters concept:

" In summary, although the idea of simplifying tools by moving the
complexity elsewhere is legitimate, pushing it down to the
kernel is the wrong approach in my opinion, perfmon has avoided
that as much as possible for good reasons. "

http://lkml.org/lkml/2008/12/5/359

Do you like the concept now? That would be great news - you have a
lot of experience with various PMU details and we could certainly
welcome help with the perf tool and with the kernel side of
perfcounters!

> [...] I posted a detailed review of the API and implementation on
> X86 outlining lots of issues. Some got fixed, but many others are
> left unresolved at this point. And I will post some more shortly.

Hm, Peter replied to you mail a week ago, in detail. We addressed a
good number of issues pointed out by you, and we credited you for
them:

earth4:~/tip> git log v2.6.30..linus | grep 'Reported-by: Stephane Eranian'
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>
Reported-by: Stephane Eranian <[email protected]>

You were on the Cc: of the commit notifications. If you see issues
left unaddressed after reply+commit please repeat it - it probably
just got lost in noise.

> I don't think that because this code is coming from you, it should
> be allowed to short-circuit the established release process. You
> have to respond to questions, fix issues like everybody else and
> if that slows down the integration you cannot blame the reviewers
> for it.

There's three maintainers of perfcounters: Peter Zijlstra, Paul
Mackerras and me - and if some real problem missed the attention of
all of us then please repeat it - it probably was just missed in a
bigger mail or so. I certainly dont remember anything major. We
generally try to reply to any and all feedback.

Thanks,

Ingo

2009-06-15 13:56:57

by Giacomo A. Catenazzi

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Sam Ravnborg wrote:
> On Thu, Jun 11, 2009 at 09:26:55AM -0700, Linus Torvalds wrote:
>>
>> On Thu, 11 Jun 2009, Christoph Hellwig wrote:
>>> Err, no. This adds tons of userspace code into tools/ which
>>> should not be in the kernel tree but a proper package.
>> I disagree.
>>
>> We've had tons of cases where we tried to "separate" the user-land code
>> and the kernel code, in the name of "beauty" of whatever.
>>
>> It's almost invariably a disaster.
>
> This is cheating. I had this as a topic for the kernel summit and
> was looking forward to read an interesting article about people
> dancing on the table and fighting in the corners about it.
> [I do not attend myself]
>
> People say that this would be a nightmare for the packagers.
> I frankly do not see what the issue is here.

Kernels don't fit well in distribution models.
We have distribution since 15 (and more) years, but still with
hackish support for kernels.

Kernel:
- people are used to install multiple "parallel" kernels
- from different sources (distribution, kernel.org)
- and a lot of people configure own kernel

This is a lot different of usual packages:
- packages have dependencies (done at pre-installation time)
- packages normally support only upgrades (and not downgrades)
- support for multiple version exist only on libraries (SONAME)

Thus a program could depends on specific version of the libc,
but it cannot depends on a specific kernel (system doesn't know
the kernel of next boot), which requires a lot of hack in init.d
scripts.

BTW one of the most frequent question on distribution was
about configuring the kernel and the error about missing
lib[n]curse[X]-dev[el].

So the 15 year without finding a good solution could explains the
nightmare, (but it could be finally the opportunity to really
solve the problem).

To conclude: a user space program should not only have a stable
ABI, but also have nice messages about unsupported features
(and wrong kernel) and not changing runtime dependencies like
socks.

ciao
cate

2009-06-15 15:15:59

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

>
> Kernels don't fit well in distribution models.
> We have distribution since 15 (and more) years, but still with
> hackish support for kernels.

Because the source are in the same git tree as the kernel does
not require the source to be considered a kernel.

You are talking from the wrong assumption that because the
source live in the git tree of the kernel it has hard
dependencies on that particular kernel.
This is wrong. There is the same requirements about being
backward/forward compatible as if the tool
lived in a git tree outside the kernel.

What we achieve by letting the userspace tools live inside
the kernel are:
- Kernel hacker tools are avaialble for the kernel hackers
- The tools are easy to upgrade when we add new stuff to the kernel
- We know that the updates are done upstream
- We have the kernel base of developers at easy reach,
no need to build up a new development community

Sam

2009-06-18 21:58:50

by Stephane Eranian

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

Hi,

On Fri, Jun 12, 2009 at 12:28 PM, Ingo Molnar<[email protected]> wrote:
>
>>
>> On Thu, Jun 11, 2009 at 6:03 PM, Ingo Molnar <[email protected]> wrote:
>>
>> > The counter concept got objected to in past discussions on lkml,
>> > by DaveM and by Stephane Eranian (i've Cc:-ed them) - so this
>> > code was not eligible for linux-next testing - nevertheless we
>> > gave it good testing on PowerPC and x86 and i've done a wide
>> > cross-build test as well to try to make sure it breaks no other
>> > architecture.
>>
>> I don't think you can quote me saying "I object to this code".
>> [...]
>
> I never saw you retract/change this negative opinion of yours about
> the whole separate-counters concept:
>
>  " In summary, although the idea of simplifying tools by moving the
>    complexity elsewhere is legitimate, pushing it down to the
>    kernel is the wrong approach in my opinion, perfmon has avoided
>    that as much as possible for good reasons. "
>
>    http://lkml.org/lkml/2008/12/5/359
>

I, indeed, did not retract because I still have reservations about the approach
even after 6 months of intense development.

> Do you like the concept now? That would be great news - you have a
> lot of experience with various PMU details and we could certainly
> welcome help with the perf tool and with the kernel side of
> perfcounters!
>

I still have reservations. I could be convinced, though. But for that to happen,
there are a couple of milestones that need to be reached:
- Full Intel Nehalem support: core PMU, uncore PMU, LBR, PEBS
(incl. load latency),
offcore_response.
- Full Intel Itanium 2 dual-core (Montecito) support: D-EAR,
I-EAR, opcode matching, range
restrictions, user level control

Those represent very advanced and very useful PMUs. Having implemented
user and kernel
support for both of them, I can attest that they challenge any
interfaces. But perfmon is the proof
that those can be exposed with their full strength thru a generic
kernel API. Therefore, I am
relatively hopeful, there should be a way to expose them through your API.

Another important consequence of your design is that the event
assignment logic is in the kernel.
As discussed early on, this can be quite complicated. Today, you only
have very partial support
for architected Intel X86 and AMD64 processors (I know about Power). I
am sure you will update
this shortly. But I think getting complete support for Intel Nehalem
and Itanium 2 in that area is
another important milestone.

Concerning help, I am sure you realize I am already helping you out by posting
detailed reviews. I have yet to see anybody else posting this kind of
information
concerning your API. I will keep posting as I find new issues. My goal is not to
torpedo this API, it's already upstream anyway, but instead I am
trying to ensure
it does what I want based on my experience developing tools, talking with PMU
architects and feedback from tool developers.

I think we could have a much more constructive working relationship if
people showed
some more respect for the work I and many others have done. Perfmon
certainly has
issues and could be implemented better. You certainly have better
skills than me in that
area. I have no problem admitting that. But I do not think perfmon
deserves the kind of
comments I have seen, repeated over and over, from you and Zijlstra
since December.
Regardless of your personal opinion, perfmon deserves some credit for
what it has offered
to many people around the world. If it had been as bad as you
described it, it could not
possibly have supported all the PMUs and their advanced features.
Nobody would have
used it. But this is not what happened.

>> [...]  I posted a detailed review of the API and implementation on
>> X86 outlining lots of issues. Some got fixed, but many others are
>> left unresolved at this point. And I will post some more shortly.
>
> Hm, Peter replied to you mail a week ago, in detail. We addressed a
> good number of issues pointed out by you, and we credited you for
> them:
>
> earth4:~/tip> git log v2.6.30..linus | grep 'Reported-by: Stephane Eranian'
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>    Reported-by: Stephane Eranian <[email protected]>
>
I know. I appreciate that. I wish you had also acknowledged the fact
that I suggested that you split the config field into type and config fields
in my initial posting. I had to discover this change by looking at the GIT
log.

2009-06-22 13:11:17

by Ingo Molnar

[permalink] [raw]
Subject: Performance analysis under Linux (was: Re: [GIT PULL] Performance Counters for Linux)

* stephane eranian <[email protected]> wrote:

[...]
> Those represent very advanced and very useful PMUs. Having
> implemented user and kernel support for both of them, I can attest
> that they challenge any interfaces. But perfmon is the proof that
> those can be exposed with their full strength thru a generic
> kernel API. Therefore, I am relatively hopeful, there should be a
> way to expose them through your API.

The thing is, in my opinion the main challenge is not in and was
never in exposing as many PMU features as possible.

The main challenge is in:

_also making it a useful solution to developers/users_

That is a key area where IMO perfcounters and perfmon differs. The
challenge of performance analysis is in:

1) Making the tool usage patterns as natural as possible. Make the
tools transparent and configuration-less by default.

2) Offer people the same rough set of common and robust features
regardless on what CPU (or even architecture) they are on.
Adding some CPU-specific feature (without generalizing it at
the same time) _never_ worked well enough.

3) Visualize the information in a rich way, making it work in as
many development workflows as possible.

tools/perf/ offers the seeds to that - it is a "full solution"
attempt at sane performance analysis tooling. It tries to be
'Oprofile done right' and 'pfmon done right'.

'perf' tries to keep the best aspects of oprofile (its user-tooling
work-flow in essence), based on a robust and tightly integrated
kernel side - and tries to expand the range and type of analysis
that can be done.

I do claim we had few if any sane performance analysis tools before
under Linux, and i think we are still in the stone ages and still
have a lot of work to do in this area.

As a sidenote, IMO Linux has become somewhat vulnerable to creeping
featurities in the past few years partly because we simply dont have
good enough tools that can _prove_ it in an easy way that a patch is
having bad effects to performance.

I see many kernel developers using oprofile only as a last-ditch
option - and that's a pity - running a profiler and interpreting its
results should be as easy as editing a file or committing a change.

We've only scratched the surface really, and the main road ahead us
is IMO not just in terms of PMU hw feature support depth (which is
relatively straightforward), but in terms of walking the full
distance and bringing it all to developers and putting it on their
desk.

So for every "will you support advanced PMU feature X, Y and Z"
question you ask, the first-level answer is: 'please show the
developer usecase and integrate it into our tools so we can see how
it all works and how useful it is'.

"A tool might want to do this" is not a good enough answer. We now
have a working OSS tool-space with 'perf' where such arguments for
more PMU features can be made in very specific terms: patches,
numbers and comparisons. Actual hands-on utility, happy developers
and faster apps is what matters in the end - not just the list of
PMU features we expose.

Ingo

2009-06-28 01:19:39

by Felipe Contreras

[permalink] [raw]
Subject: Re: [GIT PULL] Performance Counters for Linux

On Thu, Jun 11, 2009 at 11:23 PM, Ingo Molnar<[email protected]> wrote:
>
> * Linus Torvalds <[email protected]> wrote:
>
> [...]
>> What the "keep it in the kernel sources" approach hopefully allows is
>>
>>  - taking advantage of new features in a timely manner.
>>
>>    NOT with some ABI breakage, but simply things like supporting a
>>    new CPU architecture or new counters. The thing that oprofile
>>    failed at so badly in my experience.
>>
>>  - Make it easier for developers, and _avoiding_ the horrible
>>    situation where you have two different groups that don't talk
>>    well to each other because one is a group of user-space
>>    weenies, and the other is a group of manly kernel people, and
>>    there is no common ground.
>
> Yes, very much agreed.
>
> Btw., here are a couple of other arguments why i find it useful to
> have the tools/perf/ in the kernel repo:
>
> 1) Super-fast and synchronized release cycles
>
> The kernel is one of the fastest moving packages in Linux - most
> user-space packages have (much!) longer release cycles than 3
> months.
>
> A tight release schedule forces a certain amount of release
> discipline on tooling as well - so i'm glad that the two will be
> coupled. It's so easy for a promising tool to degrade into
> tinkerware with odd release cycles with time - if it's part of the
> kernel then at least the release cycles wont be odd but at precise 3
> months.
>
> 2) Performance _matters_
>
> This is an argument pretty specific to perfcounters: Performance
> analysis tools under Linux suck pretty summarily. Yet, one of the
> major strengths of Linux is (or at least used to be) performance. So
> i find it very fitting that the kernel community takes performance
> analysis tooling into their own hand.
>
> 3) Strict quality control under a proven mode
>
> In the kernel repo i can be sure that:
>
>  - No one will even think of adding autofools to tools/perf/.
>
>  - No one will send us code with Hungarian notation and two spaces
>    tabulation.
>
>  - No one will put getopt.h into the code
>
>  - No one will rewrite it in some weird language
>
>   [ Or at least, even though such incidents might happen
>     occasionally, i can just sit back in my chair and watch the
>     resulting showdown on lkml, without having to worry about the
>     outcome ;-) ]
>
> I can point contributors to well-established kernel coding
> principles, without having to argue no end about them.
>
> All in one - the Linux kernel is a fire breathing monster engine
> when it comes to producing good software. Who says it that that this
> infrastructure and experience can only be used to produce kernel
> space code?
>
> 4) Code reuse
>
> We actually use code from the kernel: list.h primitives and
> rbtrees.c. We privatized them for now under
> tools/perf/util/rbtree.[ch] and tools/perf/util/list.h because
> there's some header and type pollution in them, but it would be nice
> to include them directly and share the facilities.
>
> 5) Reality check for kernel developers
>
> I think kernel hackers need a reality check too. It's easy to say
> that user-space sucks - but now there's a way and channel that
> frustration via direct action and make a real difference. I do hope
> that the extra superfluous mental energies visible in this thread
> can be used for good purposes too ;-)
>
> 6) It's a lot of fun
>
> I never thought i'd say that - but hacking properly structured
> user-space code in the kernel repo is serious fun. It's even
> relaxing at times: i can be reasonably sure that i wont crash the
> kernel.
>
> All in one, we did this because we found that it produces better
> code in practice and does it faster - and i dont think we should
> rigidly limit the kernel repo to kernel-space projects alone.

Here's another idea:
How about putting the tools in a different repo, but keeping the
communication in lkml. Then, if a change is required on kernel and
user-space, both patches are sent to lkml and rejected if the other
side is missing.

My guess is that kernel hackers are comfortable doing stuff on the
linux repo because the know all the processes; code-style, how to send
patches, git, etc. If they can just 'git clone' and type 'make' to
build the tools and send patches to lkml I guess they'll be equally
happy hacking there too.

Cheers.

--
Felipe Contreras