2008-11-26 08:42:53

by Stephane Eranian

[permalink] [raw]
Subject: [patch 02/24] perfmon: base code

This patch adds the core functions of the perfmon subsystem.
It creates the toplevel perfmon subdirectory where all perfmon
generic code lives. Core functions implements initialization,
session allocation and termination.

Signed-off-by: Stephane Eranian <[email protected]>
--

Index: o3/perfmon/perfmon_attach.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_attach.c 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,86 @@
+/*
+ * perfmon_attach.c: perfmon2 load/unload functions
+ *
+ * This file implements the perfmon2 interface which
+ * provides access to the hardware performance counters
+ * of the host processor.
+ *
+ *
+ * The initial version of perfmon.c was written by
+ * Ganesh Venkitachalam, IBM Corp.
+ *
+ * Then it was modified for perfmon-1.x by Stephane Eranian and
+ * David Mosberger, Hewlett Packard Co.
+ *
+ * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
+ * by Stephane Eranian, Hewlett Packard Co.
+ *
+ * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ * David Mosberger-Tang <[email protected]>
+ *
+ * More information about perfmon available at:
+ * http://www.hpl.hp.com/research/linux/perfmon
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/perfmon_kern.h>
+#include "perfmon_priv.h"
+
+/**
+ * __pfm_exit_thread - detach and free context on thread exit
+ */
+void __pfm_exit_thread(void)
+{
+ struct pfm_context *ctx;
+ unsigned long flags;
+ int free_ok = 0, ret = -1;
+
+ ctx = current->pfm_context;
+
+ spin_lock_irqsave(&ctx->lock, flags);
+
+ PFM_DBG("state=%d is_self=%d", ctx->state, ctx->flags.is_self);
+
+ /*
+ * __pfm_unload_context() cannot fail
+ * in the context states we are interested in
+ */
+ switch (ctx->state) {
+ case PFM_CTX_LOADED:
+ ret = __pfm_unload_context(ctx);
+ break;
+ case PFM_CTX_ZOMBIE:
+ ret = __pfm_unload_context(ctx);
+ free_ok = 1;
+ break;
+ default:
+ BUG_ON(ctx->state != PFM_CTX_LOADED);
+ break;
+ }
+ spin_unlock_irqrestore(&ctx->lock, flags);
+
+ if (!ret)
+ pfm_session_release();
+
+ /*
+ * All memory free operations (especially for vmalloc'ed memory)
+ * MUST be done with interrupts ENABLED.
+ */
+ if (free_ok)
+ pfm_free_context(ctx);
+}
Index: o3/perfmon/perfmon_ctx.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_ctx.c 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,164 @@
+/*
+ * perfmon_ctx.c: perfmon2 context functions
+ *
+ * This file implements the perfmon2 interface which
+ * provides access to the hardware performance counters
+ * of the host processor.
+ *
+ *
+ * The initial version of perfmon.c was written by
+ * Ganesh Venkitachalam, IBM Corp.
+ *
+ * Then it was modified for perfmon-1.x by Stephane Eranian and
+ * David Mosberger, Hewlett Packard Co.
+ *
+ * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
+ * by Stephane Eranian, Hewlett Packard Co.
+ *
+ * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ * David Mosberger-Tang <[email protected]>
+ *
+ * More information about perfmon available at:
+ * http://www.hpl.hp.com/research/linux/perfmon
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/perfmon_kern.h>
+#include "perfmon_priv.h"
+
+/*
+ * context memory pool pointer
+ */
+static struct kmem_cache *pfm_ctx_cachep;
+
+/*
+ * This function is called when we need to perform asynchronous
+ * work on a context. This function is called ONLY when about to
+ * return to user mode (very much like with signal handling).
+ *
+ * we come here if:
+ *
+ * - we are zombie and we need to cleanup our state
+ *
+ * pfm_handle_work() can be called with interrupts enabled
+ * (TIF_NEED_RESCHED) or disabled.
+ */
+void pfm_handle_work(struct pt_regs *regs)
+{
+ struct pfm_context *ctx;
+ unsigned long flags;
+ int type;
+
+ if (!user_mode(regs))
+ return;
+
+ clear_thread_flag(TIF_PERFMON_WORK);
+
+ ctx = current->pfm_context;
+ if (ctx == NULL) {
+ PFM_DBG("[%d] has no ctx", current->pid);
+ return;
+ }
+
+ spin_lock_irqsave(&ctx->lock, flags);
+
+ type = ctx->flags.work_type;
+ ctx->flags.work_type = PFM_WORK_NONE;
+
+ PFM_DBG("work_type=%d", type);
+
+ switch (type) {
+ case PFM_WORK_ZOMBIE:
+ goto do_zombie;
+ default:
+ PFM_DBG("unkown type=%d", type);
+ goto nothing_todo;
+ }
+nothing_todo:
+ /*
+ * restore flags as they were upon entry
+ */
+ spin_unlock_irqrestore(&ctx->lock, flags);
+ return;
+
+do_zombie:
+ PFM_DBG("context is zombie, bailing out");
+
+ /* always returns 0 in this case */
+ __pfm_unload_context(ctx);
+
+ /*
+ * keep the spinlock check happy
+ */
+ spin_unlock(&ctx->lock);
+
+ /*
+ * enable interrupt for vfree()
+ */
+ local_irq_enable();
+
+ /*
+ * actual context free
+ */
+ pfm_free_context(ctx);
+
+ /*
+ * restore interrupts as they were upon entry
+ */
+ local_irq_restore(flags);
+
+ /*
+ * pfm_unload always successful, so can release
+ * session safely
+ */
+ pfm_session_release();
+}
+
+/**
+ * pfm_free_context - de-allocate context and associated resources
+ * @ctx: context to free
+ */
+void pfm_free_context(struct pfm_context *ctx)
+{
+ pfm_arch_context_free(ctx);
+
+ PFM_DBG("free ctx @0x%p", ctx);
+ kmem_cache_free(pfm_ctx_cachep, ctx);
+ /*
+ * decrease refcount on:
+ * - PMU description table
+ */
+ pfm_pmu_release();
+}
+
+/**
+ * pfm_init_ctx -- initialize context SLAB
+ *
+ * called from pfm_init
+ */
+int __init pfm_init_ctx(void)
+{
+ pfm_ctx_cachep = kmem_cache_create("pfm_context",
+ sizeof(struct pfm_context)+PFM_ARCH_CTX_SIZE,
+ SLAB_HWCACHE_ALIGN, 0, NULL);
+ if (!pfm_ctx_cachep) {
+ PFM_ERR("cannot initialize context slab");
+ return -ENOMEM;
+ }
+ return 0;
+}
Index: o3/perfmon/perfmon_file.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_file.c 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,94 @@
+/*
+ * perfmon_file.c: perfmon2 file input/output functions
+ *
+ * This file implements the perfmon2 interface which
+ * provides access to the hardware performance counters
+ * of the host processor.
+ *
+ * The initial version of perfmon.c was written by
+ * Ganesh Venkitachalam, IBM Corp.
+ *
+ * Then it was modified for perfmon-1.x by Stephane Eranian and
+ * David Mosberger, Hewlett Packard Co.
+ *
+ * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
+ * by Stephane Eranian, Hewlett Packard Co.
+ *
+ * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ * David Mosberger-Tang <[email protected]>
+ *
+ * More information about perfmon available at:
+ * http://perfmon2.sf.net
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/file.h>
+#include <linux/vfs.h>
+#include <linux/mount.h>
+#include <linux/perfmon_kern.h>
+#include "perfmon_priv.h"
+
+#define PFMFS_MAGIC 0xa0b4d889 /* perfmon filesystem magic number */
+
+struct pfm_controls pfm_controls = {
+ .task_group = PFM_GROUP_PERM_ANY,
+ .arg_mem_max = PAGE_SIZE,
+};
+
+static int __init enable_debug(char *str)
+{
+ pfm_controls.debug = 1;
+ PFM_INFO("debug output enabled\n");
+ return 1;
+}
+__setup("perfmon_debug", enable_debug);
+
+static int pfmfs_get_sb(struct file_system_type *fs_type,
+ int flags, const char *dev_name,
+ void *data, struct vfsmount *mnt)
+{
+ return get_sb_pseudo(fs_type, "pfm:", NULL, PFMFS_MAGIC, mnt);
+}
+
+static struct file_system_type pfm_fs_type = {
+ .name = "pfmfs",
+ .get_sb = pfmfs_get_sb,
+ .kill_sb = kill_anon_super,
+};
+
+/*
+ * pfmfs should _never_ be mounted by userland - too much of security hassle,
+ * no real gain from having the whole whorehouse mounted. So we don't need
+ * any operations on the root directory. However, we need a non-trivial
+ * d_name - pfm: will go nicely and kill the special-casing in procfs.
+ */
+static struct vfsmount *pfmfs_mnt;
+
+int __init pfm_init_fs(void)
+{
+ int err = register_filesystem(&pfm_fs_type);
+ if (!err) {
+ pfmfs_mnt = kern_mount(&pfm_fs_type);
+ err = PTR_ERR(pfmfs_mnt);
+ if (IS_ERR(pfmfs_mnt))
+ unregister_filesystem(&pfm_fs_type);
+ else
+ err = 0;
+ }
+ return err;
+}
Index: o3/perfmon/perfmon_init.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_init.c 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,84 @@
+/*
+ * perfmon.c: perfmon2 global initialization functions
+ *
+ * This file implements the perfmon2 interface which
+ * provides access to the hardware performance counters
+ * of the host processor.
+ *
+ *
+ * The initial version of perfmon.c was written by
+ * Ganesh Venkitachalam, IBM Corp.
+ *
+ * Then it was modified for perfmon-1.x by Stephane Eranian and
+ * David Mosberger, Hewlett Packard Co.
+ *
+ * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
+ * by Stephane Eranian, Hewlett Packard Co.
+ *
+ * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ * David Mosberger-Tang <[email protected]>
+ *
+ * More information about perfmon available at:
+ * http://www.hpl.hp.com/research/linux/perfmon
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/perfmon_kern.h>
+#include "perfmon_priv.h"
+
+/*
+ * external variables
+ */
+DEFINE_PER_CPU(struct task_struct *, pmu_owner);
+DEFINE_PER_CPU(struct pfm_context *, pmu_ctx);
+DEFINE_PER_CPU(u64, pmu_activation_number);
+
+int perfmon_disabled; /* >0 if perfmon is disabled */
+
+/*
+ * global initialization routine, executed only once
+ */
+int __init pfm_init(void)
+{
+ PFM_LOG("version %u.%u", PFM_VERSION_MAJ, PFM_VERSION_MIN);
+
+ if (pfm_init_ctx())
+ goto error_disable;
+
+ if (pfm_init_fs())
+ goto error_disable;
+
+ /*
+ * one time, arch-specific global initialization
+ */
+ if (pfm_arch_init())
+ goto error_disable;
+
+ return 0;
+
+error_disable:
+ PFM_ERR("perfmon is disabled due to initialization error");
+ perfmon_disabled = 1;
+ return -1;
+}
+
+/*
+ * must use subsys_initcall() to ensure that the perfmon2 core
+ * is initialized before any PMU description module when they are
+ * compiled in.
+ */
+subsys_initcall(pfm_init);
Index: o3/perfmon/perfmon_priv.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_priv.h 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,110 @@
+/*
+ * Copyright (c) 2001-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+
+#ifndef __PERFMON_PRIV_H__
+#define __PERFMON_PRIV_H__
+/*
+ * This file contains all the definitions of data structures, variables, macros
+ * that are to private to the generic code, i.e., not shared with any code that
+ * lives under arch/ or include/asm-XX
+ *
+ * For shared definitions, use include/linux/perfmon_kern.h
+ */
+
+#ifdef CONFIG_PERFMON
+
+/*
+ * context lazy save/restore activation count
+ */
+#define PFM_INVALID_ACTIVATION ((u64)~0)
+
+DECLARE_PER_CPU(u64, pmu_activation_number);
+
+static inline void pfm_set_pmu_owner(struct task_struct *task,
+ struct pfm_context *ctx)
+{
+ __get_cpu_var(pmu_owner) = task;
+ __get_cpu_var(pmu_ctx) = ctx;
+}
+
+int pfm_init_ctx(void);
+
+int pfm_session_acquire(void);
+void pfm_session_release(void);
+
+int pfm_init_sysfs(void);
+
+void pfm_free_context(struct pfm_context *ctx);
+
+void pfm_save_pmds(struct pfm_context *ctx);
+
+/*
+ * check_mask bitmask values for pfm_check_task_state()
+ */
+#define PFM_CMD_STOPPED 0x01 /* command needs thread stopped */
+#define PFM_CMD_UNLOADED 0x02 /* command needs ctx unloaded */
+#define PFM_CMD_UNLOAD 0x04 /* command is unload */
+
+/**
+ * pfm_save_prev_ctx - check if previous context exists and save state
+ *
+ * called from pfm_load_ctx_thread() and __pfm_ctxsin_thread() to
+ * check if previous context exists. If so saved its PMU state. This is used
+ * only for UP kernels.
+ *
+ * PMU ownership is not cleared because the function is always called while
+ * trying to install a new owner.
+ */
+static inline void pfm_check_save_prev_ctx(void)
+{
+#ifdef CONFIG_SMP
+ struct pfm_context *ctxp;
+
+ ctxp = __get_cpu_var(pmu_ctx);
+ if (!ctxp)
+ return;
+ /*
+ * in UP per-thread, due to lazy save
+ * there could be a context from another
+ * task. We need to push it first before
+ * installing our new state
+ */
+ pfm_save_pmds(ctxp);
+ /*
+ * do not clear ownership because we rewrite
+ * right away
+ */
+#endif
+}
+
+int pfm_init_fs(void);
+
+static inline void pfm_post_work(struct task_struct *task,
+ struct pfm_context *ctx, int type)
+{
+ ctx->flags.work_type = type;
+ set_tsk_thread_flag(task, TIF_PERFMON_WORK);
+}
+
+#define PFM_PMC_STK_ARG PFM_ARCH_PMC_STK_ARG
+#define PFM_PMD_STK_ARG PFM_ARCH_PMD_STK_ARG
+
+#endif /* CONFIG_PERFMON */
+
+#endif /* __PERFMON_PRIV_H__ */
Index: o3/perfmon/perfmon_res.c
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ o3/perfmon/perfmon_res.c 2008-11-25 17:54:36.000000000 +0100
@@ -0,0 +1,105 @@
+/*
+ * perfmon_res.c: perfmon2 resource allocations
+ *
+ * This file implements the perfmon2 interface which
+ * provides access to the hardware performance counters
+ * of the host processor.
+ *
+ * The initial version of perfmon.c was written by
+ * Ganesh Venkitachalam, IBM Corp.
+ *
+ * Then it was modified for perfmon-1.x by Stephane Eranian and
+ * David Mosberger, Hewlett Packard Co.
+ *
+ * Version Perfmon-2.x is a complete rewrite of perfmon-1.x
+ * by Stephane Eranian, Hewlett Packard Co.
+ *
+ * Copyright (c) 1999-2006 Hewlett-Packard Development Company, L.P.
+ * Contributed by Stephane Eranian <[email protected]>
+ * David Mosberger-Tang <[email protected]>
+ *
+ * More information about perfmon available at:
+ * http://perfmon2.sf.net
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+ * 02111-1307 USA
+ */
+#include <linux/kernel.h>
+#include <linux/perfmon_kern.h>
+#include "perfmon_priv.h"
+
+/*
+ * global information about all sessions
+ */
+struct pfm_resources {
+ cpumask_t sys_cpumask; /* bitmask of used cpus */
+ u32 thread_sessions; /* #num loaded per-thread sessions */
+};
+
+static struct pfm_resources pfm_res;
+
+static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pfm_res_lock);
+
+/**
+ * pfm_session_acquire - reserve a per-thread session
+ *
+ * return:
+ * 0 : success
+ * -EBUSY: if conflicting session exist
+ */
+int pfm_session_acquire(void)
+{
+ unsigned long flags;
+ int ret = 0;
+
+ /*
+ * validy checks on cpu_mask have been done upstream
+ */
+ spin_lock_irqsave(&pfm_res_lock, flags);
+
+ PFM_DBG("in thread=%u",
+ pfm_res.thread_sessions);
+
+ pfm_res.thread_sessions++;
+
+ PFM_DBG("out thread=%u ret=%d",
+ pfm_res.thread_sessions,
+ ret);
+
+ spin_unlock_irqrestore(&pfm_res_lock, flags);
+
+ return ret;
+}
+
+/**
+ * pfm_session_release - release a per-thread session
+ *
+ * called from __pfm_unload_context()
+ */
+void pfm_session_release(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&pfm_res_lock, flags);
+
+ PFM_DBG("in thread=%u",
+ pfm_res.thread_sessions);
+
+ pfm_res.thread_sessions--;
+
+ PFM_DBG("out thread=%u",
+ pfm_res.thread_sessions);
+
+ spin_unlock_irqrestore(&pfm_res_lock, flags);
+}
Index: o3/include/linux/sched.h
===================================================================
--- o3.orig/include/linux/sched.h 2008-11-25 17:51:29.000000000 +0100
+++ o3/include/linux/sched.h 2008-11-25 17:54:36.000000000 +0100
@@ -1352,6 +1352,9 @@
unsigned long default_timer_slack_ns;

struct list_head *scm_work_list;
+#ifdef CONFIG_PERFMON
+ struct pfm_context *pfm_context;
+#endif
};

/*

--


2008-11-26 11:07:18

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

On Wed, Nov 26, 2008 at 12:42:00AM -0800, [email protected] wrote:
> + * __pfm_unload_context() cannot fail
> + * in the context states we are interested in
> + */
> + switch (ctx->state) {
> + case PFM_CTX_LOADED:
> + ret = __pfm_unload_context(ctx);
> + break;
> + case PFM_CTX_ZOMBIE:
> + ret = __pfm_unload_context(ctx);
> + free_ok = 1;
> + break;
> + default:
> + BUG_ON(ctx->state != PFM_CTX_LOADED);

That is just BUG() because the condition is always true here.

> + *
> + * we come here if:
> + *
> + * - we are zombie and we need to cleanup our state

...
> +void pfm_handle_work(struct pt_regs *regs)
> +{
> + struct pfm_context *ctx;
> + unsigned long flags;
> + int type;
> +
> + if (!user_mode(regs))
> + return;

zombies are never in user mode. Either the comment
is wrong or the test.

> + * enable interrupt for vfree()
> + */
> + local_irq_enable();
> +
> + /*
> + * actual context free
> + */
> + pfm_free_context(ctx);
> +
> + /*
> + * restore interrupts as they were upon entry
> + */
> + local_irq_restore(flags);

That seems redundant after the local_irq_enable() above.

> + */
> +#include <linux/kernel.h>
> +#include <linux/fs.h>
> +#include <linux/file.h>
> +#include <linux/vfs.h>
> +#include <linux/mount.h>
> +#include <linux/perfmon_kern.h>
> +#include "perfmon_priv.h"
> +
> +#define PFMFS_MAGIC 0xa0b4d889 /* perfmon filesystem magic number */

This should be probably somewhere else.

> + return 1;
> +}
> +__setup("perfmon_debug", enable_debug);

There's a new generic dynamic_printk.h now that might be an alternative.

> +{
> + PFM_LOG("version %u.%u", PFM_VERSION_MAJ, PFM_VERSION_MIN);
> +
> + if (pfm_init_ctx())
> + goto error_disable;
> +
> + if (pfm_init_fs())
> + goto error_disable;
> +
> + /*
> + * one time, arch-specific global initialization
> + */
> + if (pfm_arch_init())
> + goto error_disable;
> +
> + return 0;
> +
> +error_disable:
> + PFM_ERR("perfmon is disabled due to initialization error");
> + perfmon_disabled = 1;
> + return -1;

I presume failure is very unlikely, but this has leaks under various
error conditions, hasn't it?

If they are very unlikely a comment that you don't handle it
intentionally is enough, but at least the comment if not
proper code should be there.

> +{
> +#ifdef CONFIG_SMP
> + struct pfm_context *ctxp;
> +
> + ctxp = __get_cpu_var(pmu_ctx);
> + if (!ctxp)
> + return;
> + /*
> + * in UP per-thread, due to lazy save

But that's in CONFIG_SMP?

Also that assumption might be not true on !x86?

> + * there could be a context from another
> + * task. We need to push it first before
> + * installing our new state
> + */
> + pfm_save_pmds(ctxp);
> + /*
> + * do not clear ownership because we rewrite
> + * right away
> + */
> +#endif
> +}
> + * global information about all sessions
> + */
> +struct pfm_resources {
> + cpumask_t sys_cpumask; /* bitmask of used cpus */
> + u32 thread_sessions; /* #num loaded per-thread sessions */
> +};
> +
> +static struct pfm_resources pfm_res;
> +
> +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pfm_res_lock);

Would it make sense to share other global perfmon context
on the same cache line?

> + */
> + spin_lock_irqsave(&pfm_res_lock, flags);
> +
> + PFM_DBG("in thread=%u",
> + pfm_res.thread_sessions);
> +
> + pfm_res.thread_sessions++;
> +
> + PFM_DBG("out thread=%u ret=%d",
> + pfm_res.thread_sessions,
> + ret);
> +
> + spin_unlock_irqrestore(&pfm_res_lock, flags);

The usual pattern would be to just do this with an atomic
counter. I suppose it wouldn't buy scalability, but perhaps
shorter code.

-Andi

--
[email protected]

2008-11-27 11:24:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

Stephane,

On Wed, 26 Nov 2008, [email protected] wrote:

> +/*
> + * pfmfs should _never_ be mounted by userland - too much of security hassle,
> + * no real gain from having the whole whorehouse mounted. So we don't need
> + * any operations on the root directory. However, we need a non-trivial
> + * d_name - pfm: will go nicely and kill the special-casing in procfs.
> + */

Can we please use anon_inodefs instead of creating a new one ? It's
exaclty for this purpose.

See fs/anon_inodes.c and fs/timerfd.c for an usage example.

Thanks,

tglx

2008-11-27 17:25:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

On Wed, 26 Nov 2008, [email protected] wrote:

> Index: o3/perfmon/perfmon_res.c

> +/*
> + * global information about all sessions
> + */
> +struct pfm_resources {
> + cpumask_t sys_cpumask; /* bitmask of used cpus */
> + u32 thread_sessions; /* #num loaded per-thread sessions */
> +};

What's the purpose of this being a structure if it's just a single
instance ?

> +static struct pfm_resources pfm_res;
> +
> +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pfm_res_lock);

> +/**
> + * pfm_session_acquire - reserve a per-thread session
> + *
> + * return:
> + * 0 : success
> + * -EBUSY: if conflicting session exist

Where ?

> + */
> +int pfm_session_acquire(void)
> +{
> + unsigned long flags;
> + int ret = 0;
> +
> + /*
> + * validy checks on cpu_mask have been done upstream
> + */

How please ? pfm_res.sys_cpumask is local to this file and you want
to check it under the lock and _before_ you increment
thread_sessions blindly.

> + spin_lock_irqsave(&pfm_res_lock, flags);
> +
> + PFM_DBG("in thread=%u",
> + pfm_res.thread_sessions);
> +
> + pfm_res.thread_sessions++;
> +
> + PFM_DBG("out thread=%u ret=%d",
> + pfm_res.thread_sessions,
> + ret);
> +
> + spin_unlock_irqrestore(&pfm_res_lock, flags);
> +
> + return ret;
> +}
> +
> +/**
> + * pfm_session_release - release a per-thread session
> + *
> + * called from __pfm_unload_context()
> + */
> +void pfm_session_release(void)
> +{
> + unsigned long flags;
> +
> + spin_lock_irqsave(&pfm_res_lock, flags);
> +
> + PFM_DBG("in thread=%u",
> + pfm_res.thread_sessions);
> +
> + pfm_res.thread_sessions--;
> +
> + PFM_DBG("out thread=%u",
> + pfm_res.thread_sessions);


What's the value of these debugs ? Prove that the compiler managed to
compile "pfm_res.thread_sessions--;" correctly ?

A WARN_ON(!pfm_res.thread_sessions) instead of blindly decrementing
would be way more useful.

> + spin_unlock_irqrestore(&pfm_res_lock, flags);
> +}

<adding the bits from the oprofile patch, which belong here as they
are not x86 specific>

+
+/**
+ * pfm_session_allcpus_acquire - acquire per-cpu sessions on all available cpus
+ *
+ * currently used by Oprofile on X86
+ */
+int pfm_session_allcpus_acquire(void)

+ for_each_online_cpu(cpu) {
+ cpu_set(cpu, pfm_res.sys_cpumask);
+ nsys_cpus++;
+ }

Sigh, why do we need a loop to copy a bitfield ?

+/**
+ * pfm_session_allcpus_release - relase per-cpu sessions on all cpus
+ *
+ * currently used by Oprofile code
+ */
+void pfm_session_allcpus_release(void)
+{
+ unsigned long flags;
+ u32 nsys_cpus, cpu;
+
+ spin_lock_irqsave(&pfm_res_lock, flags);
+
+ nsys_cpus = cpus_weight(pfm_res.sys_cpumask);
+
+ PFM_DBG("in sys=%u task=%u",
+ nsys_cpus,
+ pfm_res.thread_sessions);
+
+ /*
+ * XXX: could use __cpus_clear() with nbits
+ */

__cpus_clear(pfm_res.sys_cpumask, nsys_cpus); ????

That'd be real fun with a sparse mask.

+ for_each_online_cpu(cpu) {
+ cpu_clear(cpu, pfm_res.sys_cpumask);
+ nsys_cpus--;
+ }

Yuck. cpus_clear() perhaps ?

+EXPORT_SYMBOL(pfm_session_allcpus_release);

All what that code should do (in fact it does not) is preventing the
mix of thread and system wide sessions.

It does neither need a cpumask nor tons of useless loops and debug
outputs with zero value.

static int global_session;
static int thread_sessions;
static DEFINE_SPINLOCK(session_lock);

int pfm_session_request(int global)
{
unsigned long flags;
int res = -EBUSY;

spin_lock_irqsave(&session_lock, flags);

if (!global && !global_session) {
thread_sessions++;
res = 0;
}

if (global && !thread_sessions && !global_session) {
global_session = 1;
res = 0;
}

spin_unlock_irqrestore(&session_lock, flags);
return res;
}

void pfm_session_release(int global)
{
unsigned long flags;

spin_lock_irqsave(&session_lock, flags);

if (global) {
WARN_ON(!global_session);
global_session = 0;
} else {
if (!global_session && thread_sessions)
thread_session--;
else
WARN();
}

spin_unlock_irqrestore(&session_lock, flags);
}

Would do it nicely including useful sanity checks and 70% less code.

Thanks,

tglx

2008-11-27 17:48:17

by Stephane Eranian

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

Thomas,



On Thu, Nov 27, 2008 at 6:24 PM, Thomas Gleixner <[email protected]> wrote:
> On Wed, 26 Nov 2008, [email protected] wrote:
>
>> Index: o3/perfmon/perfmon_res.c
>
>> +/*
>> + * global information about all sessions
>> + */
>> +struct pfm_resources {
>> + cpumask_t sys_cpumask; /* bitmask of used cpus */
>> + u32 thread_sessions; /* #num loaded per-thread sessions */
>> +};
>
> What's the purpose of this being a structure if it's just a single
> instance ?
>
There is a single instance.
I was just trying to regroup related fields together instead of having them as
separate variables. I can make the change.


>> +static struct pfm_resources pfm_res;
>> +
>> +static __cacheline_aligned_in_smp DEFINE_SPINLOCK(pfm_res_lock);
>
>> +/**
>> + * pfm_session_acquire - reserve a per-thread session
>> + *
>> + * return:
>> + * 0 : success
>> + * -EBUSY: if conflicting session exist
>
> Where ?

Not in the patchset, conflict can arise when you add system-wide sessions.

>
>> + */
>> +int pfm_session_acquire(void)
>> +{
>> + unsigned long flags;
>> + int ret = 0;
>> +
>> + /*
>> + * validy checks on cpu_mask have been done upstream
>> + */
>
> How please ? pfm_res.sys_cpumask is local to this file and you want
> to check it under the lock and _before_ you increment
> thread_sessions blindly.
>
Stale comment.

>
> +EXPORT_SYMBOL(pfm_session_allcpus_release);
>
> All what that code should do (in fact it does not) is preventing the
> mix of thread and system wide sessions.
>
True. That is a current limitation.

> It does neither need a cpumask nor tons of useless loops and debug
> outputs with zero value.
>
Well, the the cpumask is indeed needed but you don't see the reason
why in the patchset!

Perfmon in system-wide does not operate like Oprofile. In system-wide
a perfmon session (context) monitors only ONE CPU at a time. Each
session is independent of each other. You can therefore measure different
things on different CPUs. Reservation is thus done independently for each
CPU, therefore we need a cpu bitmask to track allocation.

The Oprofile reservation you see is built on top of the cpumask reservation.
It tries to allocate in one call and atomically ALL the CPUs as this is the way
Oprofile operates. Thus it fails if one perfmon system-wide session or one
perfmon per-thread exists. I believe there is enough bitmask functions to
avoid loops as you pointed out. I will fix that.


> static int global_session;
> static int thread_sessions;
> static DEFINE_SPINLOCK(session_lock);
>
> int pfm_session_request(int global)
> {
> unsigned long flags;
> int res = -EBUSY;
>
> spin_lock_irqsave(&session_lock, flags);
>
> if (!global && !global_session) {
> thread_sessions++;
> res = 0;
> }
>
> if (global && !thread_sessions && !global_session) {
> global_session = 1;
> res = 0;
> }
>
> spin_unlock_irqrestore(&session_lock, flags);
> return res;
> }
>
> void pfm_session_release(int global)
> {
> unsigned long flags;
>
> spin_lock_irqsave(&session_lock, flags);
>
> if (global) {
> WARN_ON(!global_session);
> global_session = 0;
> } else {
> if (!global_session && thread_sessions)
> thread_session--;
> else
> WARN();
> }
>
> spin_unlock_irqrestore(&session_lock, flags);
> }
>
> Would do it nicely including useful sanity checks and 70% less code.
>
> Thanks,
>
> tglx
>

2008-11-27 18:29:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

Stephane,

On Thu, 27 Nov 2008, stephane eranian wrote:
> > What's the purpose of this being a structure if it's just a single
> > instance ?
> >
> There is a single instance.
> I was just trying to regroup related fields together instead of having them as
> separate variables. I can make the change.

Well, if you do a structure then put the lock in it as well, so its on
one cacheline.

> >> + * -EBUSY: if conflicting session exist
> >
> > Where ?
>
> Not in the patchset, conflict can arise when you add system-wide sessions.

Well, conflicts arise when oprofile is running as well, isn't it ?

> > How please ? pfm_res.sys_cpumask is local to this file and you want
> > to check it under the lock and _before_ you increment
> > thread_sessions blindly.
> >
> Stale comment.

Well, where is it checked ? Where is checked whether Oprofile runs or not ?

> > All what that code should do (in fact it does not) is preventing the
> > mix of thread and system wide sessions.
> >
> True. That is a current limitation.
>
> > It does neither need a cpumask nor tons of useless loops and debug
> > outputs with zero value.
> >
> Well, the the cpumask is indeed needed but you don't see the reason
> why in the patchset!

If its not needed now, then we can either remove it or do at least
something useful with it.

> Perfmon in system-wide does not operate like Oprofile. In system-wide
> a perfmon session (context) monitors only ONE CPU at a time. Each

Then it is a percpu session and not system wide. Please use less
confusing names.

> session is independent of each other. You can therefore measure different
> things on different CPUs. Reservation is thus done independently for each
> CPU, therefore we need a cpu bitmask to track allocation.

Ok. Question: if you do a one CPU wide session with perfom, can you
still do thread monitoring on the same CPU ?

If no, what prevents that a monitored thread is migrated to such a CPU ?

> The Oprofile reservation you see is built on top of the cpumask reservation.
> It tries to allocate in one call and atomically ALL the CPUs as this is the way
> Oprofile operates. Thus it fails if one perfmon system-wide session or one
> perfmon per-thread exists.

This only prevents oprofile from starting, but it does neither prevent
thread sessions nor does it prevent a perfmon per cpu session on a cpu
which was onlined after oprofile started, simply because it's bit is
missing in the CPU mask.

Oprofile if active starts profiling on cpu hotplug, but if you look at
the cpumask with a perfmon per cpu request it will succeed.

If you do resource management and that is what the file claims to do,
then you need to do it in a consistent way:

Oprofile can only run, when no thread and per cpu perfmon jobs are active.

Perfmon per cpu and thread jobs can only run when oprofile is not active.

Not sure about the thread vs. per cpu perfmon situation. See question above.

Thanks,

tglx

2008-11-27 18:31:29

by Andi Kleen

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

> Well, where is it checked ? Where is checked whether Oprofile runs or not ?

That is done using the perfctr reservation. I saw that somewhere in the
patchkit. The NMI watchdog uses that too.

> > The Oprofile reservation you see is built on top of the cpumask reservation.
> > It tries to allocate in one call and atomically ALL the CPUs as this is the way
> > Oprofile operates. Thus it fails if one perfmon system-wide session or one
> > perfmon per-thread exists.
>
> This only prevents oprofile from starting, but it does neither prevent
> thread sessions nor does it prevent a perfmon per cpu session on a cpu
> which was onlined after oprofile started, simply because it's bit is
> missing in the CPU mask.

The perfctr reservation is global over all CPUs.

-Andi

--
[email protected]

2008-11-27 18:37:23

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

On Thu, 27 Nov 2008, Andi Kleen wrote:

> > Well, where is it checked ? Where is checked whether Oprofile runs or not ?
>
> That is done using the perfctr reservation. I saw that somewhere in the
> patchkit. The NMI watchdog uses that too.
>
> > > The Oprofile reservation you see is built on top of the cpumask reservation.
> > > It tries to allocate in one call and atomically ALL the CPUs as this is the way
> > > Oprofile operates. Thus it fails if one perfmon system-wide session or one
> > > perfmon per-thread exists.
> >
> > This only prevents oprofile from starting, but it does neither prevent
> > thread sessions nor does it prevent a perfmon per cpu session on a cpu
> > which was onlined after oprofile started, simply because it's bit is
> > missing in the CPU mask.
>
> The perfctr reservation is global over all CPUs.

So this mean we manage resources on multiple levels, some bits here
and some bits there and five checks in each code path do get them all.

Really convincing concept.

Thanks,

tglx

2008-11-27 18:52:18

by Stephane Eranian

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

thomas,

On Thu, Nov 27, 2008 at 7:28 PM, Thomas Gleixner <[email protected]> wrote:
> Stephane,
>
> On Thu, 27 Nov 2008, stephane eranian wrote:
>> > What's the purpose of this being a structure if it's just a single
>> > instance ?
>> >
>> There is a single instance.
>> I was just trying to regroup related fields together instead of having them as
>> separate variables. I can make the change.
>
> Well, if you do a structure then put the lock in it as well, so its on
> one cacheline.
>
Good point.

>> >> + * -EBUSY: if conflicting session exist
>> >
>> > Where ?
>>
>> Not in the patchset, conflict can arise when you add system-wide sessions.
>
> Well, conflicts arise when oprofile is running as well, isn't it ?
>
Correct.

>> > How please ? pfm_res.sys_cpumask is local to this file and you want
>> > to check it under the lock and _before_ you increment
>> > thread_sessions blindly.
>> >
>> Stale comment.
>
> Well, where is it checked ? Where is checked whether Oprofile runs or not ?
>
It does not care whether it is Oprofile, NMI or any other subsystem.
What matters
is:
- what PMU registers are available?
- what CPU are not used for monitoring?
- are there per-thread sessions running?

>> > All what that code should do (in fact it does not) is preventing the
>> > mix of thread and system wide sessions.
>> >
>> True. That is a current limitation.
>>
>> > It does neither need a cpumask nor tons of useless loops and debug
>> > outputs with zero value.
>> >
>> Well, the the cpumask is indeed needed but you don't see the reason
>> why in the patchset!
>
> If its not needed now, then we can either remove it or do at least
> something useful with it.
>
That something useful is the reserve all or nothing for Oprofile.

>> Perfmon in system-wide does not operate like Oprofile. In system-wide
>> a perfmon session (context) monitors only ONE CPU at a time. Each
>
> Then it is a percpu session and not system wide. Please use less
> confusing names.
>
I know that. I have used this name since the beginning, it's more legacy
than anything else. Let's call this cpu-wide mode. I think people are more
familiar with the notion of system-wide than cpu-wide.


>> session is independent of each other. You can therefore measure different
>> things on different CPUs. Reservation is thus done independently for each
>> CPU, therefore we need a cpu bitmask to track allocation.
>
> Ok. Question: if you do a one CPU wide session with perfom, can you
> still do thread monitoring on the same CPU ?
>
No. They are currently mutually exclusive.

> If no, what prevents that a monitored thread is migrated to such a CPU ?
>
Nothing. AND you don't want to change affinity because you are monitoring.
So the current restriction is that cpu-wide and per-thread are
mutually exclusive.
The only way to avoid that is to partition the PMU register so each can co-exist
on the same CPU. I have not reached that point yet. They are also some hardware
limitations which prevent that from being implemented, e.g., on Itanium.

>> The Oprofile reservation you see is built on top of the cpumask reservation.
>> It tries to allocate in one call and atomically ALL the CPUs as this is the way
>> Oprofile operates. Thus it fails if one perfmon system-wide session or one
>> perfmon per-thread exists.
>
> This only prevents oprofile from starting, but it does neither prevent
> thread sessions nor does it prevent a perfmon per cpu session on a cpu
> which was onlined after oprofile started, simply because it's bit is
> missing in the CPU mask.
>
That is a good point!

The test needs to be more sophisticated than that. I guess we can keep the
'global' variable you've introduced and check against that first, then check
individual bits for conflicting perfmon cpu-wide session.

> Oprofile if active starts profiling on cpu hotplug, but if you look at
> the cpumask with a perfmon per cpu request it will succeed.
>
> If you do resource management and that is what the file claims to do,
> then you need to do it in a consistent way:
>
> Oprofile can only run, when no thread and per cpu perfmon jobs are active.
>
> Perfmon per cpu and thread jobs can only run when oprofile is not active.
>
> Not sure about the thread vs. per cpu perfmon situation. See question above.

2008-11-27 18:54:44

by Stephane Eranian

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

Thomas,

On Thu, Nov 27, 2008 at 7:36 PM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 27 Nov 2008, Andi Kleen wrote:
>
>> > Well, where is it checked ? Where is checked whether Oprofile runs or not ?
>>
>> That is done using the perfctr reservation. I saw that somewhere in the
>> patchkit. The NMI watchdog uses that too.
>>
>> > > The Oprofile reservation you see is built on top of the cpumask reservation.
>> > > It tries to allocate in one call and atomically ALL the CPUs as this is the way
>> > > Oprofile operates. Thus it fails if one perfmon system-wide session or one
>> > > perfmon per-thread exists.
>> >
>> > This only prevents oprofile from starting, but it does neither prevent
>> > thread sessions nor does it prevent a perfmon per cpu session on a cpu
>> > which was onlined after oprofile started, simply because it's bit is
>> > missing in the CPU mask.
>>
>> The perfctr reservation is global over all CPUs.
>
> So this mean we manage resources on multiple levels, some bits here
> and some bits there and five checks in each code path do get them all.
>
> Really convincing concept.

That is a different level of resource management: register level.

Both Perfmon and Oprofile can operate with fewer registers than what the
hardware actually supports. Take an AMD64 CPU with 4 counters. With
the NMI watchdog active, Perfmon or Oprofile will run with only 3. That's
better than nothing and is sufficient for many measurments.

2008-11-27 19:36:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

Stephane,

On Thu, 27 Nov 2008, stephane eranian wrote:

> >> session is independent of each other. You can therefore measure different
> >> things on different CPUs. Reservation is thus done independently for each
> >> CPU, therefore we need a cpu bitmask to track allocation.
> >
> > Ok. Question: if you do a one CPU wide session with perfom, can you
> > still do thread monitoring on the same CPU ?
> >
> No. They are currently mutually exclusive.
>
> > If no, what prevents that a monitored thread is migrated to such a CPU ?
> >
> Nothing. AND you don't want to change affinity because you are monitoring.
> So the current restriction is that cpu-wide and per-thread are
> mutually exclusive.

And how is this achieved ? Currently there seems nothing which
prevents a per-thread vs. cpu-wide monitoring.

Thanks,

tglx

2008-11-27 19:49:36

by Stephane Eranian

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

On Thu, Nov 27, 2008 at 8:35 PM, Thomas Gleixner <[email protected]> wrote:
> Stephane,
>
> On Thu, 27 Nov 2008, stephane eranian wrote:
>
>> >> session is independent of each other. You can therefore measure different
>> >> things on different CPUs. Reservation is thus done independently for each
>> >> CPU, therefore we need a cpu bitmask to track allocation.
>> >
>> > Ok. Question: if you do a one CPU wide session with perfom, can you
>> > still do thread monitoring on the same CPU ?
>> >
>> No. They are currently mutually exclusive.
>>
>> > If no, what prevents that a monitored thread is migrated to such a CPU ?
>> >
>> Nothing. AND you don't want to change affinity because you are monitoring.
>> So the current restriction is that cpu-wide and per-thread are
>> mutually exclusive.
>
> And how is this achieved ? Currently there seems nothing which
> prevents a per-thread vs. cpu-wide monitoring.
>
That's true, but that's because cpu-wide support is not included in the
patchset.

2008-11-27 21:00:12

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 02/24] perfmon: base code

On Thu, 27 Nov 2008, stephane eranian wrote:

> On Thu, Nov 27, 2008 at 8:35 PM, Thomas Gleixner <[email protected]> wrote:
> > Stephane,
> >
> > On Thu, 27 Nov 2008, stephane eranian wrote:
> >
> >> >> session is independent of each other. You can therefore measure different
> >> >> things on different CPUs. Reservation is thus done independently for each
> >> >> CPU, therefore we need a cpu bitmask to track allocation.
> >> >
> >> > Ok. Question: if you do a one CPU wide session with perfom, can you
> >> > still do thread monitoring on the same CPU ?
> >> >
> >> No. They are currently mutually exclusive.
> >>
> >> > If no, what prevents that a monitored thread is migrated to such a CPU ?
> >> >
> >> Nothing. AND you don't want to change affinity because you are monitoring.
> >> So the current restriction is that cpu-wide and per-thread are
> >> mutually exclusive.
> >
> > And how is this achieved ? Currently there seems nothing which
> > prevents a per-thread vs. cpu-wide monitoring.
> >
> That's true, but that's because cpu-wide support is not included in the
> patchset.

That's the whole point I'm making. For the current patch set the
simple global vs. thread exclusivity is sufficient and correct.

When we gradually add stuff then we simply can add the extra checks
and think about the impact and consequences in that context.

Thanks,

tglx