2007-09-21 23:32:48

by Guillaume Chazarain

[permalink] [raw]
Subject: [PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones

TASKSTATS_CMD_ATTR_TGID used to return only the delay accounting stats, not
the basic and extended accounting. With this patch,
TASKSTATS_CMD_ATTR_TGID also aggregates the accounting info for all threads
of a thread group.

TASKSTATS_CMD_ATTR_PID output should be unchanged
TASKSTATS_CMD_ATTR_TGID output should have all fields set, unlike before the
patch where most of the fiels were set to 0.

To this aim, two functions were introduced: fill_threadgroup() and add_tsk().
These functions are responsible for aggregating the subsystem specific
accounting information. Taskstats requesters (fill_pid(), fill_tgid() and
fill_tgid_exit()) should only call add_tsk() and fill_threadgroup() to get the
stats.

Signed-off-by: Guillaume Chazarain <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Jonathan Lim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---

Documentation/accounting/getdelays.c | 2 -
include/linux/tsacct_kern.h | 12 ++-
kernel/taskstats.c | 131 ++++++++++++++++++++++------------
kernel/tsacct.c | 106 ++++++++++++++++------------
4 files changed, 155 insertions(+), 96 deletions(-)

diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
index cbee3a2..78773c0 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -76,7 +76,7 @@ static void usage(void)
fprintf(stderr, "getdelays [-dilv] [-w logfile] [-r bufsize] "
"[-m cpumask] [-t tgid] [-p pid]\n");
fprintf(stderr, " -d: print delayacct stats\n");
- fprintf(stderr, " -i: print IO accounting (works only with -p)\n");
+ fprintf(stderr, " -i: print IO accounting\n");
fprintf(stderr, " -l: listen forever\n");
fprintf(stderr, " -v: debug on\n");
}
diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h
index 7e50ac7..93dffc2 100644
--- a/include/linux/tsacct_kern.h
+++ b/include/linux/tsacct_kern.h
@@ -10,17 +10,23 @@
#include <linux/taskstats.h>

#ifdef CONFIG_TASKSTATS
-extern void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk);
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
#else
-static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
+static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+{}
+static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *task)
{}
#endif /* CONFIG_TASKSTATS */

#ifdef CONFIG_TASK_XACCT
-extern void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
+void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
extern void acct_update_integrals(struct task_struct *tsk);
extern void acct_clear_integrals(struct task_struct *tsk);
#else
+static inline void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+{}
static inline void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
{}
static inline void acct_update_integrals(struct task_struct *tsk)
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 059431e..ce43fae 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -168,6 +168,68 @@ static void send_cpu_listeners(struct sk_buff *skb,
up_write(&listeners->sem);
}

+/**
+ * fill_threadgroup - initialize some common stats for the thread group
+ * @stats: the taskstats to write into
+ * @task: the thread representing the whole group
+ *
+ * There are two types of taskstats fields when considering a thread group:
+ * - those that can be aggregated from each thread in the group (like CPU
+ * times),
+ * - those that cannot be aggregated (like UID) or are identical (like
+ * memory usage), so are taken from the group leader.
+ * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
+ * the second.
+ */
+static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+{
+ /*
+ * Each accounting subsystem adds calls to its functions to initialize
+ * relevant parts of struct taskstsats for a single tgid as follows:
+ *
+ * per-task-foo-fill_threadgroup(stats, task);
+ */
+
+ stats->version = TASKSTATS_VERSION;
+
+ /* fill in basic acct fields */
+ bacct_fill_threadgroup(stats, task);
+
+ /* fill in extended acct fields */
+ xacct_fill_threadgroup(stats, task);
+}
+
+/**
+ * add_tsk - combine some thread specific stats in a taskstats
+ * @stats: the taskstats to write into
+ * @task: the thread to combine
+ *
+ * Stats specific to each thread in the thread group. Stats of @task should be
+ * combined with those already present in @stats. add_tsk() works in
+ * conjunction with fill_threadgroup(), taskstats fields should not be touched
+ * by both functions.
+ */
+static void add_tsk(struct taskstats *stats, struct task_struct *task)
+{
+ /*
+ * Each accounting subsystem adds calls to its functions to combine
+ * relevant parts of struct taskstsats for a single pid as follows:
+ *
+ * per-task-foo-add_tsk(stats, task);
+ */
+ stats->nvcsw += task->nvcsw;
+ stats->nivcsw += task->nivcsw;
+
+ /* fill in delay acct fields */
+ delayacct_add_tsk(stats, task);
+
+ /* fill in basic acct fields */
+ bacct_add_tsk(stats, task);
+
+ /* fill in extended acct fields */
+ xacct_add_tsk(stats, task);
+}
+
static int fill_pid(pid_t pid, struct task_struct *tsk,
struct taskstats *stats)
{
@@ -185,23 +247,8 @@ static int fill_pid(pid_t pid, struct task_struct *tsk,
get_task_struct(tsk);

memset(stats, 0, sizeof(*stats));
- /*
- * Each accounting subsystem adds calls to its functions to
- * fill in relevant parts of struct taskstsats as follows
- *
- * per-task-foo(stats, tsk);
- */
-
- delayacct_add_tsk(stats, tsk);
-
- /* fill in basic acct fields */
- stats->version = TASKSTATS_VERSION;
- stats->nvcsw = tsk->nvcsw;
- stats->nivcsw = tsk->nivcsw;
- bacct_add_tsk(stats, tsk);
-
- /* fill in extended acct fields */
- xacct_add_tsk(stats, tsk);
+ add_tsk(stats, tsk);
+ fill_threadgroup(stats, tsk);

/* Define err: label here if needed */
put_task_struct(tsk);
@@ -233,31 +280,20 @@ static int fill_tgid(pid_t tgid, struct task_struct *first,
memset(stats, 0, sizeof(*stats));

tsk = first;
- do {
- if (tsk->exit_state)
- continue;
- /*
- * Accounting subsystem can call its functions here to
- * fill in relevant parts of struct taskstsats as follows
- *
- * per-task-foo(stats, tsk);
- */
- delayacct_add_tsk(stats, tsk);
-
- stats->nvcsw += tsk->nvcsw;
- stats->nivcsw += tsk->nivcsw;
- } while_each_thread(first, tsk);
-
+ do
+ if (!tsk->exit_state)
+ /*
+ * This check is racy as a thread could exit just right
+ * now and have its statistics accounted twice.
+ */
+ add_tsk(stats, tsk);
+ while_each_thread(first, tsk);
+
+ fill_threadgroup(stats, first->group_leader);
unlock_task_sighand(first, &flags);
rc = 0;
out:
rcu_read_unlock();
-
- stats->version = TASKSTATS_VERSION;
- /*
- * Accounting subsytems can also add calls here to modify
- * fields of taskstats.
- */
return rc;
}

@@ -267,19 +303,14 @@ static void fill_tgid_exit(struct task_struct *tsk)
unsigned long flags;

spin_lock_irqsave(&tsk->sighand->siglock, flags);
- if (!tsk->signal->stats)
- goto ret;

/*
- * Each accounting subsystem calls its functions here to
- * accumalate its per-task stats for tsk, into the per-tgid structure
- *
- * per-task-foo(tsk->signal->stats, tsk);
+ * The fill_threadgroup() part of the statistics will be added by the
+ * stats requester, i.e. fill_tgid() or taskstats_exit()
*/
- delayacct_add_tsk(tsk->signal->stats, tsk);
-ret:
+ add_tsk(tsk->signal->stats, tsk);
+
spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
- return;
}

static int add_del_listener(pid_t pid, cpumask_t *maskp, int isadd)
@@ -508,7 +539,13 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
if (!stats)
goto err;

+ /*
+ * Race here: the last thread may have not finished its
+ * fill_tgid_exit(), leaving an incomplete tsk->signal->stats
+ */
+
memcpy(stats, tsk->signal->stats, sizeof(*stats));
+ fill_threadgroup(stats, tsk->group_leader);

send:
send_cpu_listeners(rep_skb, listeners);
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 4ab1b58..9541a1a 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -22,54 +22,68 @@
#include <linux/acct.h>
#include <linux/jiffies.h>

-/*
- * fill in basic accounting fields
- */
-void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
+static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk)
{
struct timespec uptime, ts;
s64 ac_etime;

- BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
-
/* calculate task elapsed time in timespec */
do_posix_clock_monotonic_gettime(&uptime);
ts = timespec_sub(uptime, tsk->start_time);
/* rebase elapsed time to usec */
ac_etime = timespec_to_ns(&ts);
do_div(ac_etime, NSEC_PER_USEC);
- stats->ac_etime = ac_etime;
- stats->ac_btime = get_seconds() - ts.tv_sec;
- if (thread_group_leader(tsk)) {
- stats->ac_exitcode = tsk->exit_code;
- if (tsk->flags & PF_FORKNOEXEC)
+ stats->ac_etime = ac_etime;
+ stats->ac_btime = get_seconds() - ts.tv_sec;
+}
+
+/*
+ * fill in basic accounting fields
+ */
+
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+{
+ BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
+
+ rcu_read_lock();
+ stats->ac_ppid = pid_alive(task) ?
+ rcu_dereference(task->real_parent)->tgid : 0;
+ rcu_read_unlock();
+
+ fill_wall_times(stats, task);
+
+ if (thread_group_leader(task)) {
+ stats->ac_exitcode = task->exit_code;
+ if (task->flags & PF_FORKNOEXEC)
stats->ac_flag |= AFORK;
}
+
+ stats->ac_nice = task_nice(task);
+ stats->ac_sched = task->policy;
+ stats->ac_uid = task->uid;
+ stats->ac_gid = task->gid;
+ stats->ac_pid = task->pid;
+
+ strncpy(stats->ac_comm, task->comm, sizeof(stats->ac_comm));
+}
+
+void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
+{
if (tsk->flags & PF_SUPERPRIV)
stats->ac_flag |= ASU;
if (tsk->flags & PF_DUMPCORE)
stats->ac_flag |= ACORE;
if (tsk->flags & PF_SIGNALED)
stats->ac_flag |= AXSIG;
- stats->ac_nice = task_nice(tsk);
- stats->ac_sched = tsk->policy;
- stats->ac_uid = tsk->uid;
- stats->ac_gid = tsk->gid;
- stats->ac_pid = tsk->pid;
- rcu_read_lock();
- stats->ac_ppid = pid_alive(tsk) ?
- rcu_dereference(tsk->real_parent)->tgid : 0;
- rcu_read_unlock();
- stats->ac_utime = cputime_to_msecs(tsk->utime) * USEC_PER_MSEC;
- stats->ac_stime = cputime_to_msecs(tsk->stime) * USEC_PER_MSEC;
- stats->ac_utimescaled =
+
+ stats->ac_utime += cputime_to_msecs(tsk->utime) * USEC_PER_MSEC;
+ stats->ac_stime += cputime_to_msecs(tsk->stime) * USEC_PER_MSEC;
+ stats->ac_utimescaled +=
cputime_to_msecs(tsk->utimescaled) * USEC_PER_MSEC;
- stats->ac_stimescaled =
+ stats->ac_stimescaled +=
cputime_to_msecs(tsk->stimescaled) * USEC_PER_MSEC;
- stats->ac_minflt = tsk->min_flt;
- stats->ac_majflt = tsk->maj_flt;
-
- strncpy(stats->ac_comm, tsk->comm, sizeof(stats->ac_comm));
+ stats->ac_minflt += tsk->min_flt;
+ stats->ac_majflt += tsk->maj_flt;
}


@@ -80,32 +94,34 @@ void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
/*
* fill in extended accounting fields
*/
-void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
+void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
{
struct mm_struct *mm;

- /* convert pages-jiffies to Mbyte-usec */
- stats->coremem = jiffies_to_usecs(p->acct_rss_mem1) * PAGE_SIZE / MB;
- stats->virtmem = jiffies_to_usecs(p->acct_vm_mem1) * PAGE_SIZE / MB;
- mm = get_task_mm(p);
+ mm = get_task_mm(task);
if (mm) {
/* adjust to KB unit */
stats->hiwater_rss = mm->hiwater_rss * PAGE_SIZE / KB;
- stats->hiwater_vm = mm->hiwater_vm * PAGE_SIZE / KB;
+ stats->hiwater_vm = mm->hiwater_vm * PAGE_SIZE / KB;
mmput(mm);
}
- stats->read_char = p->rchar;
- stats->write_char = p->wchar;
- stats->read_syscalls = p->syscr;
- stats->write_syscalls = p->syscw;
+}
+
+void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
+{
+ /* convert pages-jiffies to Mbyte-usec */
+ stats->coremem += jiffies_to_usecs(p->acct_rss_mem1) * PAGE_SIZE / MB;
+ stats->virtmem += jiffies_to_usecs(p->acct_vm_mem1) * PAGE_SIZE / MB;
+
+ stats->read_char += p->rchar;
+ stats->write_char += p->wchar;
+ stats->read_syscalls += p->syscr;
+ stats->write_syscalls += p->syscw;
+
#ifdef CONFIG_TASK_IO_ACCOUNTING
- stats->read_bytes = p->ioac.read_bytes;
- stats->write_bytes = p->ioac.write_bytes;
- stats->cancelled_write_bytes = p->ioac.cancelled_write_bytes;
-#else
- stats->read_bytes = 0;
- stats->write_bytes = 0;
- stats->cancelled_write_bytes = 0;
+ stats->read_bytes += p->ioac.read_bytes;
+ stats->write_bytes += p->ioac.write_bytes;
+ stats->cancelled_write_bytes += p->ioac.cancelled_write_bytes;
#endif
}
#undef KB


2007-09-21 23:33:05

by Guillaume Chazarain

[permalink] [raw]
Subject: [PATCH 2/3] taskstats: tell fill_thread_group() whether it replies with PID or TGID stats

fill_thread_group() may want to know if it is filling TASKSTATS_CMD_ATTR_TGID
or TASKSTATS_CMD_ATTR_PID stats, so give it this information in the tg_stats
boolean.

Signed-off-by: Guillaume Chazarain <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Jonathan Lim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---

include/linux/tsacct_kern.h | 4 ++--
kernel/taskstats.c | 12 +++++++-----
kernel/tsacct.c | 3 ++-
3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h
index 93dffc2..5652ae0 100644
--- a/include/linux/tsacct_kern.h
+++ b/include/linux/tsacct_kern.h
@@ -10,10 +10,10 @@
#include <linux/taskstats.h>

#ifdef CONFIG_TASKSTATS
-void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats);
void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
#else
-static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task, bool tg_stats)
{}
static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *task)
{}
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index ce43fae..42d3110 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -172,6 +172,7 @@ static void send_cpu_listeners(struct sk_buff *skb,
* fill_threadgroup - initialize some common stats for the thread group
* @stats: the taskstats to write into
* @task: the thread representing the whole group
+ * @tg_stats: whether in the end thread group stats are requested
*
* There are two types of taskstats fields when considering a thread group:
* - those that can be aggregated from each thread in the group (like CPU
@@ -181,7 +182,8 @@ static void send_cpu_listeners(struct sk_buff *skb,
* XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
* the second.
*/
-static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+static void fill_threadgroup(struct taskstats *stats, struct task_struct *task,
+ bool tg_stats)
{
/*
* Each accounting subsystem adds calls to its functions to initialize
@@ -193,7 +195,7 @@ static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
stats->version = TASKSTATS_VERSION;

/* fill in basic acct fields */
- bacct_fill_threadgroup(stats, task);
+ bacct_fill_threadgroup(stats, task, tg_stats);

/* fill in extended acct fields */
xacct_fill_threadgroup(stats, task);
@@ -248,7 +250,7 @@ static int fill_pid(pid_t pid, struct task_struct *tsk,

memset(stats, 0, sizeof(*stats));
add_tsk(stats, tsk);
- fill_threadgroup(stats, tsk);
+ fill_threadgroup(stats, tsk, false);

/* Define err: label here if needed */
put_task_struct(tsk);
@@ -289,7 +291,7 @@ static int fill_tgid(pid_t tgid, struct task_struct *first,
add_tsk(stats, tsk);
while_each_thread(first, tsk);

- fill_threadgroup(stats, first->group_leader);
+ fill_threadgroup(stats, first->group_leader, true);
unlock_task_sighand(first, &flags);
rc = 0;
out:
@@ -545,7 +547,7 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
*/

memcpy(stats, tsk->signal->stats, sizeof(*stats));
- fill_threadgroup(stats, tsk->group_leader);
+ fill_threadgroup(stats, tsk->group_leader, true);

send:
send_cpu_listeners(rep_skb, listeners);
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 9541a1a..24056aa 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -41,7 +41,8 @@ static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk)
* fill in basic accounting fields
*/

-void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
+void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task,
+ bool tg_stats)
{
BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);


2007-09-21 23:33:34

by Guillaume Chazarain

[permalink] [raw]
Subject: [PATCH 3/3] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code

Threads also have an exit code on their own, so report it in
TASKSTATS_CMD_ATTR_PID.

For TASKSTATS_CMD_ATTR_TGID, instead of relying only on the exit code of the
leader, we use task->signal->group_exit_code if not null as suggested by
Oleg Nesterov.

Also, document that as of this patch, fill_threadgroup() must be called after
add_tsk() as it may overwrite some stats.

Signed-off-by: Guillaume Chazarain <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Jay Lan <[email protected]>
Cc: Jonathan Lim <[email protected]>
Cc: Oleg Nesterov <[email protected]>
---

kernel/taskstats.c | 3 +++
kernel/tsacct.c | 12 +++++++-----
2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 42d3110..24d7f62 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -181,6 +181,9 @@ static void send_cpu_listeners(struct sk_buff *skb,
* memory usage), so are taken from the group leader.
* XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
* the second.
+ *
+ * fill_threadgroup() may overwrite stats from add_tsk(), so it must be called
+ * after add_tsk().
*/
static void fill_threadgroup(struct taskstats *stats, struct task_struct *task,
bool tg_stats)
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index 24056aa..526b134 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -44,6 +44,8 @@ static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk)
void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task,
bool tg_stats)
{
+ int group_exit_code;
+
BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);

rcu_read_lock();
@@ -53,11 +55,11 @@ void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task,

fill_wall_times(stats, task);

- if (thread_group_leader(task)) {
- stats->ac_exitcode = task->exit_code;
- if (task->flags & PF_FORKNOEXEC)
- stats->ac_flag |= AFORK;
- }
+ if (thread_group_leader(task) && (task->flags & PF_FORKNOEXEC))
+ stats->ac_flag |= AFORK;
+
+ group_exit_code = tg_stats ? task->signal->group_exit_code : 0;
+ stats->ac_exitcode = group_exit_code ? : task->exit_code;

stats->ac_nice = task_nice(task);
stats->ac_sched = task->policy;

2007-09-22 18:07:27

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones

Guillaume Chazarain wrote:
> TASKSTATS_CMD_ATTR_TGID used to return only the delay accounting stats, not
> the basic and extended accounting. With this patch,
> TASKSTATS_CMD_ATTR_TGID also aggregates the accounting info for all threads
> of a thread group.
>
> TASKSTATS_CMD_ATTR_PID output should be unchanged
> TASKSTATS_CMD_ATTR_TGID output should have all fields set, unlike before the
> patch where most of the fiels were set to 0.
>
> To this aim, two functions were introduced: fill_threadgroup() and add_tsk().
> These functions are responsible for aggregating the subsystem specific
> accounting information. Taskstats requesters (fill_pid(), fill_tgid() and
> fill_tgid_exit()) should only call add_tsk() and fill_threadgroup() to get the
> stats.
>
> Signed-off-by: Guillaume Chazarain <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Jay Lan <[email protected]>
> Cc: Jonathan Lim <[email protected]>
> Cc: Oleg Nesterov <[email protected]>
> ---
>
> Documentation/accounting/getdelays.c | 2 -
> include/linux/tsacct_kern.h | 12 ++-
> kernel/taskstats.c | 131 ++++++++++++++++++++++------------
> kernel/tsacct.c | 106 ++++++++++++++++------------
> 4 files changed, 155 insertions(+), 96 deletions(-)
>
> diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
> index cbee3a2..78773c0 100644
> --- a/Documentation/accounting/getdelays.c
> +++ b/Documentation/accounting/getdelays.c
> @@ -76,7 +76,7 @@ static void usage(void)
> fprintf(stderr, "getdelays [-dilv] [-w logfile] [-r bufsize] "
> "[-m cpumask] [-t tgid] [-p pid]\n");
> fprintf(stderr, " -d: print delayacct stats\n");
> - fprintf(stderr, " -i: print IO accounting (works only with -p)\n");
> + fprintf(stderr, " -i: print IO accounting\n");
> fprintf(stderr, " -l: listen forever\n");
> fprintf(stderr, " -v: debug on\n");
> }
> diff --git a/include/linux/tsacct_kern.h b/include/linux/tsacct_kern.h
> index 7e50ac7..93dffc2 100644
> --- a/include/linux/tsacct_kern.h
> +++ b/include/linux/tsacct_kern.h
> @@ -10,17 +10,23 @@
> #include <linux/taskstats.h>
>
> #ifdef CONFIG_TASKSTATS
> -extern void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk);
> +void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
> +void bacct_add_tsk(struct taskstats *stats, struct task_struct *task);
> #else
> -static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
> +static inline void bacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
> +{}
> +static inline void bacct_add_tsk(struct taskstats *stats, struct task_struct *task)
> {}
> #endif /* CONFIG_TASKSTATS */
>
> #ifdef CONFIG_TASK_XACCT
> -extern void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
> +void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task);
> +void xacct_add_tsk(struct taskstats *stats, struct task_struct *p);
> extern void acct_update_integrals(struct task_struct *tsk);
> extern void acct_clear_integrals(struct task_struct *tsk);
> #else
> +static inline void xacct_fill_threadgroup(struct taskstats *stats, struct task_struct *task)
> +{}
> static inline void xacct_add_tsk(struct taskstats *stats, struct task_struct *p)
> {}
> static inline void acct_update_integrals(struct task_struct *tsk)
> diff --git a/kernel/taskstats.c b/kernel/taskstats.c
> index 059431e..ce43fae 100644
> --- a/kernel/taskstats.c
> +++ b/kernel/taskstats.c
> @@ -168,6 +168,68 @@ static void send_cpu_listeners(struct sk_buff *skb,
> up_write(&listeners->sem);
> }
>
> +/**
> + * fill_threadgroup - initialize some common stats for the thread group
> + * @stats: the taskstats to write into
> + * @task: the thread representing the whole group
> + *
> + * There are two types of taskstats fields when considering a thread group:
> + * - those that can be aggregated from each thread in the group (like CPU
> + * times),
> + * - those that cannot be aggregated (like UID) or are identical (like
> + * memory usage), so are taken from the group leader.
> + * XXX_threadgroup() methods deal with the first type while XXX_add_tsk() with
> + * the second.
> + */
> +static void fill_threadgroup(struct taskstats *stats, struct task_struct *task)
> +{

How about calling this one fill_threadgroup_stats()?

> + /*
> + * Each accounting subsystem adds calls to its functions to initialize
> + * relevant parts of struct taskstsats for a single tgid as follows:
> + *
> + * per-task-foo-fill_threadgroup(stats, task);
> + */
> +
> + stats->version = TASKSTATS_VERSION;
> +
> + /* fill in basic acct fields */
> + bacct_fill_threadgroup(stats, task);
> +
> + /* fill in extended acct fields */
> + xacct_fill_threadgroup(stats, task);
> +}
> +
> +/**
> + * add_tsk - combine some thread specific stats in a taskstats
> + * @stats: the taskstats to write into
> + * @task: the thread to combine
> + *
> + * Stats specific to each thread in the thread group. Stats of @task should be
> + * combined with those already present in @stats. add_tsk() works in
> + * conjunction with fill_threadgroup(), taskstats fields should not be touched
> + * by both functions.
> + */
> +static void add_tsk(struct taskstats *stats, struct task_struct *task)
> +{

How about we call function add_tsk_stats()?

> + /*
> + * Each accounting subsystem adds calls to its functions to combine
> + * relevant parts of struct taskstsats for a single pid as follows:
> + *
> + * per-task-foo-add_tsk(stats, task);
> + */
> + stats->nvcsw += task->nvcsw;
> + stats->nivcsw += task->nivcsw;
> +
> + /* fill in delay acct fields */
> + delayacct_add_tsk(stats, task);
> +
> + /* fill in basic acct fields */
> + bacct_add_tsk(stats, task);
> +
> + /* fill in extended acct fields */
> + xacct_add_tsk(stats, task);
> +}
> +
> static int fill_pid(pid_t pid, struct task_struct *tsk,
> struct taskstats *stats)
> {
> @@ -185,23 +247,8 @@ static int fill_pid(pid_t pid, struct task_struct *tsk,
> get_task_struct(tsk);
>
> memset(stats, 0, sizeof(*stats));
> - /*
> - * Each accounting subsystem adds calls to its functions to
> - * fill in relevant parts of struct taskstsats as follows
> - *
> - * per-task-foo(stats, tsk);
> - */
> -
> - delayacct_add_tsk(stats, tsk);
> -
> - /* fill in basic acct fields */
> - stats->version = TASKSTATS_VERSION;
> - stats->nvcsw = tsk->nvcsw;
> - stats->nivcsw = tsk->nivcsw;
> - bacct_add_tsk(stats, tsk);
> -
> - /* fill in extended acct fields */
> - xacct_add_tsk(stats, tsk);
> + add_tsk(stats, tsk);
> + fill_threadgroup(stats, tsk);
>

So we always call fill_threadgroup later, is there a reason for
that. Can the order of calls be arbitrary or is there a dependency?

> /* Define err: label here if needed */
> put_task_struct(tsk);
> @@ -233,31 +280,20 @@ static int fill_tgid(pid_t tgid, struct task_struct *first,
> memset(stats, 0, sizeof(*stats));
>
> tsk = first;
> - do {
> - if (tsk->exit_state)
> - continue;
> - /*
> - * Accounting subsystem can call its functions here to
> - * fill in relevant parts of struct taskstsats as follows
> - *
> - * per-task-foo(stats, tsk);
> - */
> - delayacct_add_tsk(stats, tsk);
> -
> - stats->nvcsw += tsk->nvcsw;
> - stats->nivcsw += tsk->nivcsw;
> - } while_each_thread(first, tsk);
> -
> + do
> + if (!tsk->exit_state)
> + /*
> + * This check is racy as a thread could exit just right
> + * now and have its statistics accounted twice.
> + */
> + add_tsk(stats, tsk);
> + while_each_thread(first, tsk);
> +

I still prefer braces around do <--> while, I think the code is easier
to read with them.

> + fill_threadgroup(stats, first->group_leader);
> unlock_task_sighand(first, &flags);
> rc = 0;
> out:
> rcu_read_unlock();
> -
> - stats->version = TASKSTATS_VERSION;
> - /*
> - * Accounting subsytems can also add calls here to modify
> - * fields of taskstats.
> - */
> return rc;
> }
>
> @@ -267,19 +303,14 @@ static void fill_tgid_exit(struct task_struct *tsk)
> unsigned long flags;
>
> spin_lock_irqsave(&tsk->sighand->siglock, flags);
> - if (!tsk->signal->stats)
> - goto ret;
>
> /*
> - * Each accounting subsystem calls its functions here to
> - * accumalate its per-task stats for tsk, into the per-tgid structure
> - *
> - * per-task-foo(tsk->signal->stats, tsk);
> + * The fill_threadgroup() part of the statistics will be added by the
> + * stats requester, i.e. fill_tgid() or taskstats_exit()
> */
> - delayacct_add_tsk(tsk->signal->stats, tsk);
> -ret:
> + add_tsk(tsk->signal->stats, tsk);
> +
> spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
> - return;
> }
>
> static int add_del_listener(pid_t pid, cpumask_t *maskp, int isadd)
> @@ -508,7 +539,13 @@ void taskstats_exit(struct task_struct *tsk, int group_dead)
> if (!stats)
> goto err;
>
> + /*
> + * Race here: the last thread may have not finished its
> + * fill_tgid_exit(), leaving an incomplete tsk->signal->stats
> + */
> +
> memcpy(stats, tsk->signal->stats, sizeof(*stats));
> + fill_threadgroup(stats, tsk->group_leader);
>
> send:
> send_cpu_listeners(rep_skb, listeners);
> diff --git a/kernel/tsacct.c b/kernel/tsacct.c
> index 4ab1b58..9541a1a 100644
> --- a/kernel/tsacct.c
> +++ b/kernel/tsacct.c
> @@ -22,54 +22,68 @@
> #include <linux/acct.h>
> #include <linux/jiffies.h>
>
> -/*
> - * fill in basic accounting fields
> - */


Could we further split the tsacct and delayacct/taskstats patches.

> -void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
> +static void fill_wall_times(struct taskstats *stats, struct task_struct *tsk)
> {
> struct timespec uptime, ts;
> s64 ac_etime;
>
> - BUILD_BUG_ON(TS_COMM_LEN < TASK_COMM_LEN);
> -
> /* calculate task elapsed time in timespec */
> do_posix_clock_monotonic_gettime(&uptime);
> ts = timespec_sub(uptime, tsk->start_time);
> /* rebase elapsed time to usec */
> ac_etime = timespec_to_ns(&ts);
> do_div(ac_etime, NSEC_PER_USEC);
> - stats->ac_etime = ac_etime;
> - stats->ac_btime = get_seconds() - ts.tv_sec;
> - if (thread_group_leader(tsk)) {
> - stats->ac_exitcode = tsk->exit_code;
> - if (tsk->flags & PF_FORKNOEXEC)
> + stats->ac_etime = ac_etime;
> + stats->ac_btime = get_seconds() - ts.tv_sec;
> +}
> +
> +/*
> + * fill in basic accounting fields
> + */
> +


--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL

2007-09-26 17:34:55

by Guillaume Chazarain

[permalink] [raw]
Subject: Re: [PATCH 1/3] taskstats: separate PID/TGID stats producers to complete the TGID ones

Le Sat, 22 Sep 2007 23:36:29 +0530,
Balbir Singh <[email protected]> a écrit :

[reordered]

> How about calling this one fill_threadgroup_stats()?
> How about we call function add_tsk_stats()?
> I still prefer braces around do <--> while, I think the code is easier
> to read with them.
> Could we further split the tsacct and delayacct/taskstats patches.

Hi Balbir,

Thank for your review, hopefully I addressed all your items in this
series.

> So we always call fill_threadgroup later, is there a reason for
> that. Can the order of calls be arbitrary or is there a dependency?

Yes, as you saw there is this dependency, it is needed only for
[PATCH 7/8] taskstats: fix stats->ac_exitcode to work on threads and use group_exit_code
but I found it simpler to put the calls in the right order from the
beginning.

Thanks again, and sorry for the double mail bomb.

--
Guillaume