2018-06-22 07:17:15

by Xunlei Pang

[permalink] [raw]
Subject: [PATCH] sched/cputime: Ensure correct utime and stime proportion

We use per-cgroup cpu usage statistics similar to "cgroup rstat",
and encountered a problem that user and sys usages are wrongly
split sometimes.

Run tasks with some random run-sleep pattern for a long time, and
when tick-based time and scheduler sum_exec_runtime hugely drifts
apart(scheduler sum_exec_runtime is less than tick-based time),
the current implementation of cputime_adjust() will produce less
sys usage than the actual use after changing to run a different
workload pattern with high sys. This is because total tick-based
utime and stime are used to split the total sum_exec_runtime.

Same problem exists on utime and stime from "/proc/<pid>/stat".

[Example]
Run some random run-sleep patterns for minutes, then change to run
high sys pattern, and watch.
1) standard "top"(which is the correct one):
4.6 us, 94.5 sy, 0.0 ni, 0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) our tool parsing utime and stime from "/proc/<pid>/stat":
20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys

This patch fixes the issue by calculating using all kinds of time
elapsed since last parse in cputime_adjust(), and accumulate the
corresponding results calculated into prev_cputime. A new field
of task_cputime type is added in structure prev_cputime to record
previous task_cputime so that we can get the elapsed time deltas.

Signed-off-by: Xunlei Pang <[email protected]>
---
include/linux/sched.h | 33 +++++++++++------------
include/linux/sched/cputime.h | 12 ++++++++-
kernel/sched/cputime.c | 61 ++++++++++++++++---------------------------
3 files changed, 51 insertions(+), 55 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 87bf02d93a27..5108ac8414e0 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -223,6 +223,22 @@ extern void io_schedule_finish(int token);
extern long io_schedule_timeout(long timeout);
extern void io_schedule(void);

+/**
+ * struct task_cputime - collected CPU time counts
+ * @utime: time spent in user mode, in nanoseconds
+ * @stime: time spent in kernel mode, in nanoseconds
+ * @sum_exec_runtime: total time spent on the CPU, in nanoseconds
+ *
+ * This structure groups together three kinds of CPU time that are tracked for
+ * threads and thread groups. Most things considering CPU time want to group
+ * these counts together and treat all three of them in parallel.
+ */
+struct task_cputime {
+ u64 utime;
+ u64 stime;
+ unsigned long long sum_exec_runtime;
+};
+
/**
* struct prev_cputime - snapshot of system and user cputime
* @utime: time spent in user mode
@@ -236,26 +252,11 @@ struct prev_cputime {
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
u64 utime;
u64 stime;
+ struct task_cputime cputime;
raw_spinlock_t lock;
#endif
};

-/**
- * struct task_cputime - collected CPU time counts
- * @utime: time spent in user mode, in nanoseconds
- * @stime: time spent in kernel mode, in nanoseconds
- * @sum_exec_runtime: total time spent on the CPU, in nanoseconds
- *
- * This structure groups together three kinds of CPU time that are tracked for
- * threads and thread groups. Most things considering CPU time want to group
- * these counts together and treat all three of them in parallel.
- */
-struct task_cputime {
- u64 utime;
- u64 stime;
- unsigned long long sum_exec_runtime;
-};
-
/* Alternate field names when used on cache expirations: */
#define virt_exp utime
#define prof_exp stime
diff --git a/include/linux/sched/cputime.h b/include/linux/sched/cputime.h
index 53f883f5a2fd..49f8fd2564ed 100644
--- a/include/linux/sched/cputime.h
+++ b/include/linux/sched/cputime.h
@@ -175,10 +175,20 @@ static inline void account_group_exec_runtime(struct task_struct *tsk,
atomic64_add(ns, &cputimer->cputime_atomic.sum_exec_runtime);
}

-static inline void prev_cputime_init(struct prev_cputime *prev)
+static inline void prev_cputime_clear(struct prev_cputime *prev)
{
#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
prev->utime = prev->stime = 0;
+ prev->cputime.utime = 0;
+ prev->cputime.stime = 0;
+ prev->cputime.sum_exec_runtime = 0;
+#endif
+}
+
+static inline void prev_cputime_init(struct prev_cputime *prev)
+{
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+ prev_cputime_clear(prev);
raw_spin_lock_init(&prev->lock);
#endif
}
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 0796f938c4f0..a68483ee3ad7 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -590,69 +590,54 @@ static u64 scale_stime(u64 stime, u64 rtime, u64 total)
void cputime_adjust(struct task_cputime *curr, struct prev_cputime *prev,
u64 *ut, u64 *st)
{
- u64 rtime, stime, utime;
+ u64 rtime_delta, stime_delta, utime_delta;
unsigned long flags;

/* Serialize concurrent callers such that we can honour our guarantees */
raw_spin_lock_irqsave(&prev->lock, flags);
- rtime = curr->sum_exec_runtime;

/*
* This is possible under two circumstances:
- * - rtime isn't monotonic after all (a bug);
+ * - task_cputime isn't monotonic after all (a bug);
* - we got reordered by the lock.
*
* In both cases this acts as a filter such that the rest of the code
* can assume it is monotonic regardless of anything else.
*/
- if (prev->stime + prev->utime >= rtime)
+ if (prev->cputime.utime > curr->utime ||
+ prev->cputime.stime > curr->stime ||
+ prev->cputime.sum_exec_runtime >= curr->sum_exec_runtime)
goto out;

- stime = curr->stime;
- utime = curr->utime;
+ stime_delta = curr->stime - prev->cputime.stime;
+ utime_delta = curr->utime - prev->cputime.utime;
+ rtime_delta = curr->sum_exec_runtime - prev->cputime.sum_exec_runtime;

/*
- * If either stime or utime are 0, assume all runtime is userspace.
- * Once a task gets some ticks, the monotonicy code at 'update:'
- * will ensure things converge to the observed ratio.
+ * If either stime or utime increase are 0, assume all runtime
+ * is userspace. Once a task gets some ticks, the monotonicy code
+ * at 'update:' will ensure things converge to the observed ratio.
*/
- if (stime == 0) {
- utime = rtime;
+ if (stime_delta == 0) {
+ utime_delta = rtime_delta;
goto update;
}

- if (utime == 0) {
- stime = rtime;
+ if (utime_delta == 0) {
+ stime_delta = rtime_delta;
goto update;
}

- stime = scale_stime(stime, rtime, stime + utime);
+ stime_delta = scale_stime(stime_delta, rtime_delta,
+ stime_delta + utime_delta);
+ if (stime_delta > rtime_delta)
+ stime_delta = rtime_delta;
+ utime_delta = rtime_delta - stime_delta;

update:
- /*
- * Make sure stime doesn't go backwards; this preserves monotonicity
- * for utime because rtime is monotonic.
- *
- * utime_i+1 = rtime_i+1 - stime_i
- * = rtime_i+1 - (rtime_i - utime_i)
- * = (rtime_i+1 - rtime_i) + utime_i
- * >= utime_i
- */
- if (stime < prev->stime)
- stime = prev->stime;
- utime = rtime - stime;
-
- /*
- * Make sure utime doesn't go backwards; this still preserves
- * monotonicity for stime, analogous argument to above.
- */
- if (utime < prev->utime) {
- utime = prev->utime;
- stime = rtime - utime;
- }
-
- prev->stime = stime;
- prev->utime = utime;
+ prev->cputime = *curr;
+ prev->utime += utime_delta;
+ prev->stime += stime_delta;
out:
*ut = prev->utime;
*st = prev->stime;
--
2.14.1.40.g8e62ba1



2018-06-22 10:21:37

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

Hi Xunlei,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on v4.18-rc1 next-20180622]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
mm/mempool.c:228: warning: Function parameter or member 'pool' not described in 'mempool_init'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev'
include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev'
include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
include/net/mac80211.h:955: warning: Function parameter or member 'control.rates' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.use_rts' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.use_cts_prot' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.short_preamble' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.skip_table' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.jiffies' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.vif' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.hw_key' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.flags' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'control.enqueue_time' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'ack' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'ack.cookie' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info'
include/net/mac80211.h:955: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info'
>> include/linux/sched.h:259: warning: Function parameter or member 'cputime' not described in 'prev_cputime'
kernel/sched/fair.c:3760: warning: Function parameter or member 'flags' not described in 'attach_entity_load_avg'
include/linux/device.h:93: warning: bad line: this bus.
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.cb' not described in 'dma_buf'
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.poll' not described in 'dma_buf'
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_excl.active' not described in 'dma_buf'
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.cb' not described in 'dma_buf'
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.poll' not described in 'dma_buf'
include/linux/dma-buf.h:307: warning: Function parameter or member 'cb_shared.active' not described in 'dma_buf'
include/linux/dma-fence-array.h:54: warning: Function parameter or member 'work' not described in 'dma_fence_array'
include/linux/gpio/driver.h:142: warning: Function parameter or member 'request_key' not described in 'gpio_irq_chip'
include/linux/iio/hw-consumer.h:1: warning: no structured comments found
include/linux/device.h:94: warning: bad line: this bus.
include/linux/input/sparse-keymap.h:46: warning: Function parameter or member 'sw' not described in 'key_entry'
include/linux/regulator/driver.h:227: warning: Function parameter or member 'resume_early' not described in 'regulator_ops'
drivers/regulator/core.c:4465: warning: Excess function parameter 'state' description in 'regulator_suspend_late'
arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw0' not described in 'irb'
arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw1' not described in 'irb'
arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw2' not described in 'irb'
arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.esw3' not described in 'irb'
arch/s390/include/asm/cio.h:245: warning: Function parameter or member 'esw.eadm' not described in 'irb'
drivers/usb/dwc3/gadget.c:510: warning: Excess function parameter 'dwc' description in 'dwc3_gadget_start_config'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_pin' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_unpin' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_res_obj' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_get_sg_table' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_import_sg_table' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vmap' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_vunmap' not described in 'drm_driver'
include/drm/drm_drv.h:610: warning: Function parameter or member 'gem_prime_mmap' not described in 'drm_driver'
drivers/gpu/drm/i915/i915_vma.h:48: warning: cannot understand function prototype: 'struct i915_vma '
drivers/gpu/drm/i915/i915_vma.h:1: warning: no structured comments found
include/drm/tinydrm/tinydrm.h:34: warning: Function parameter or member 'fb_dirty' not described in 'tinydrm_device'
drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'crtc_state' not described in 'mipi_dbi_enable_flush'
drivers/gpu/drm/tinydrm/mipi-dbi.c:272: warning: Function parameter or member 'plane_state' not described in 'mipi_dbi_enable_flush'

vim +259 include/linux/sched.h

d37f761db Frederic Weisbecker 2012-11-22 @259

:::::: The code at line 259 was first introduced by commit
:::::: d37f761dbd276790f70dcf73a287fde2c3464482 cputime: Consolidate cputime adjustment code

:::::: TO: Frederic Weisbecker <[email protected]>
:::::: CC: Frederic Weisbecker <[email protected]>

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (11.53 kB)
.config.gz (6.30 kB)
Download all attachments

2018-06-25 08:16:41

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 6/22/18 6:35 PM, kbuild test robot wrote:
> Hi Xunlei,
>
> Thank you for the patch! Perhaps something to improve:
>
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on v4.18-rc1 next-20180622]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> url: https://github.com/0day-ci/linux/commits/Xunlei-Pang/sched-cputime-Ensure-correct-utime-and-stime-proportion/20180622-153720
> reproduce: make htmldocs
>
> All warnings (new ones prefixed by >>):
>
> WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
> mm/mempool.c:228: warning: Function parameter or member 'pool' not described in 'mempool_init'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev'
> include/net/cfg80211.h:4279: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev'
> include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.units_pos' not described in 'ieee80211_hw'
> include/net/mac80211.h:2282: warning: Function parameter or member 'radiotap_timestamp.accuracy' not described in 'ieee80211_hw'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.rates' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.rts_cts_rate_idx' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.use_rts' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.use_cts_prot' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.short_preamble' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.skip_table' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.jiffies' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.vif' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.hw_key' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.flags' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'control.enqueue_time' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'ack' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'ack.cookie' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.rates' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.ack_signal' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_ack_len' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.ampdu_len' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.antenna' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.tx_time' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.is_valid_ack_signal' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'status.status_driver_data' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'driver_rates' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'pad' not described in 'ieee80211_tx_info'
> include/net/mac80211.h:955: warning: Function parameter or member 'rate_driver_data' not described in 'ieee80211_tx_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.signal' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'rx_stats_avg.chain_signal' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.filtered' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_failed' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.retry_count' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.lost_packets' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_tdls_pkt_time' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_retries' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.msdu_failed' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.last_ack_signal' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.ack_signal_filled' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'status_stats.avg_ack_signal' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.packets' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.bytes' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.last_rate' not described in 'sta_info'
> net/mac80211/sta_info.h:588: warning: Function parameter or member 'tx_stats.msdu' not described in 'sta_info'
>>> include/linux/sched.h:259: warning: Function parameter or member 'cputime' not described in 'prev_cputime'

OK, I can update its description above the structure definition next
version.

2018-06-26 12:22:07

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 6/22/18 3:15 PM, Xunlei Pang wrote:
> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> and encountered a problem that user and sys usages are wrongly
> split sometimes.
>
> Run tasks with some random run-sleep pattern for a long time, and
> when tick-based time and scheduler sum_exec_runtime hugely drifts
> apart(scheduler sum_exec_runtime is less than tick-based time),
> the current implementation of cputime_adjust() will produce less
> sys usage than the actual use after changing to run a different
> workload pattern with high sys. This is because total tick-based
> utime and stime are used to split the total sum_exec_runtime.
>
> Same problem exists on utime and stime from "/proc/<pid>/stat".
>
> [Example]
> Run some random run-sleep patterns for minutes, then change to run
> high sys pattern, and watch.
> 1) standard "top"(which is the correct one):
> 4.6 us, 94.5 sy, 0.0 ni, 0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> 2) our tool parsing utime and stime from "/proc/<pid>/stat":
> 20.5 usr, 78.4 sys
> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> gradually with time: 9.7 usr, 89.5 sys
>
High sys probably means there's something abnormal on the kernel
path, it may hide issues, so we should make it fairly reliable.
It can easily hit this problem with our per-cgroup statistics.

Hi Peter, any comment on this patch?

2018-06-26 15:50:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
> On 6/22/18 3:15 PM, Xunlei Pang wrote:
> > We use per-cgroup cpu usage statistics similar to "cgroup rstat",
> > and encountered a problem that user and sys usages are wrongly
> > split sometimes.
> >
> > Run tasks with some random run-sleep pattern for a long time, and
> > when tick-based time and scheduler sum_exec_runtime hugely drifts
> > apart(scheduler sum_exec_runtime is less than tick-based time),
> > the current implementation of cputime_adjust() will produce less
> > sys usage than the actual use after changing to run a different
> > workload pattern with high sys. This is because total tick-based
> > utime and stime are used to split the total sum_exec_runtime.
> >
> > Same problem exists on utime and stime from "/proc/<pid>/stat".
> >
> > [Example]
> > Run some random run-sleep patterns for minutes, then change to run
> > high sys pattern, and watch.
> > 1) standard "top"(which is the correct one):
> > 4.6 us, 94.5 sy, 0.0 ni, 0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
> > 2) our tool parsing utime and stime from "/proc/<pid>/stat":
> > 20.5 usr, 78.4 sys
> > We can see "20.5 usr" displayed in 2) was incorrect, it recovers
> > gradually with time: 9.7 usr, 89.5 sys
> >
> High sys probably means there's something abnormal on the kernel
> path, it may hide issues, so we should make it fairly reliable.
> It can easily hit this problem with our per-cgroup statistics.
>
> Hi Peter, any comment on this patch?

Well, no, because the Changelog is incomprehensible and the patch
doesn't really have useful comments, so I'll have to reverse engineer
the entire thing, and I've just not had time for that.

2018-06-27 12:23:54

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 6/26/18 11:49 PM, Peter Zijlstra wrote:
> On Tue, Jun 26, 2018 at 08:19:49PM +0800, Xunlei Pang wrote:
>> On 6/22/18 3:15 PM, Xunlei Pang wrote:
>>> We use per-cgroup cpu usage statistics similar to "cgroup rstat",
>>> and encountered a problem that user and sys usages are wrongly
>>> split sometimes.
>>>
>>> Run tasks with some random run-sleep pattern for a long time, and
>>> when tick-based time and scheduler sum_exec_runtime hugely drifts
>>> apart(scheduler sum_exec_runtime is less than tick-based time),
>>> the current implementation of cputime_adjust() will produce less
>>> sys usage than the actual use after changing to run a different
>>> workload pattern with high sys. This is because total tick-based
>>> utime and stime are used to split the total sum_exec_runtime.
>>>
>>> Same problem exists on utime and stime from "/proc/<pid>/stat".
>>>
>>> [Example]
>>> Run some random run-sleep patterns for minutes, then change to run
>>> high sys pattern, and watch.
>>> 1) standard "top"(which is the correct one):
>>> 4.6 us, 94.5 sy, 0.0 ni, 0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
>>> 2) our tool parsing utime and stime from "/proc/<pid>/stat":
>>> 20.5 usr, 78.4 sys
>>> We can see "20.5 usr" displayed in 2) was incorrect, it recovers
>>> gradually with time: 9.7 usr, 89.5 sys
>>>
>> High sys probably means there's something abnormal on the kernel
>> path, it may hide issues, so we should make it fairly reliable.
>> It can easily hit this problem with our per-cgroup statistics.
>>
>> Hi Peter, any comment on this patch?
>
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.
>

Let me try the best to describe it again.

There are two types of run time for a process:
1) task_struct::utime, task_struct::stime in ticks.
2) scheduler task_struct::se.sum_exec_runtime(rtime) in ns.

In case of no vtime accounting, the utime/stime fileds of
/proc/pid/stat are calculated by cputime_adjust(), which
splits the precise rtime in the proportion of tick-based
utime and stime.

However cputime_adjust() always does the split using the
total utime/stime of the process, this may cause wrong
splitting in some cases, e.g.

A typical statistic collector accesses "/proc/pid/stat".
1) moment t0
After accessed /proc/pid/stat in t0:
tick-based whole utime is utime_0, tick-based whole stime
is stime_0, scheduler time is rtime_0. The ajusted utime
caculated by cputime_adjust() is autime_0, ajusted stime
is astime_0, so astime_0=rtime_0*stime_0/(utime_0+stime_0).

For a long time, the process runs mainly in userspace with
run-sleep patterns, and because two different clocks, it
is possible to have the following condition:
rtime_0 < utime_0 (as with little stime_0)

2) moment t1(after dt, i.e. t0+dt)
Then the process suddenly runs 100% sys workload afterwards
lasting "dt", when accessing /proc/pid/stat at t1="t0+dt",
both rtime_0 and stime_0 increase "dt", thus cputime_ajust()
does the calculation for new adjusted astime_1 as follows:
(rtime_0+dt)*(stime_0+dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt+stime_0*dt+dt*dt)/(utime_0+stime_0+dt)
= (rtime_0*stime_0+rtime_0*dt-utime_0*dt)/(utime_0+stime_0+dt) + dt
< rtime_0*stime_0/(utime_0+stime_0+dt) + dt (for rtime_0 < utime_0)
< rtime_0*stime_0/(utime_0+stime_0) + dt
< astime_0+dt

The actual astime_1 should be "astime_0+dt"(as it runs 100%
sys during dt), but the caculated figure by cputime_adjust()
becomes much smaller, as a result the statistics collector
shows less cpu sys usage than the actual one.

That's why we occasionally observed the incorrect cpu usage
described in the changelog:
[Example]
Run some random run-sleep patterns for minutes, then change
to run high sys pattern, and watch.
1) standard "top"(which is the correct one):
4.6 us, 94.5 sy, 0.0 ni, 0.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
2) Parsing utime and stime from "/proc/<pid>/stat":
20.5 usr, 78.4 sys
We can see "20.5 usr" displayed in 2) was incorrect, it recovers
gradually with time: 9.7 usr, 89.5 sys


Thanks,
Xunlei

2018-07-02 15:23:23

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

Hello, Peter.

On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
> Well, no, because the Changelog is incomprehensible and the patch
> doesn't really have useful comments, so I'll have to reverse engineer
> the entire thing, and I've just not had time for that.

Just as an additional data point, we also sometimes see artifacts from
cpu_adjust_time() in the form of per-task user or sys time getting
stuck for some period (in extreme cases for over a minute) while the
application isn't doing anything differently. We're telling the users
that it's an inherent sampling artifact but it'd be nice to improve it
if possible without adding noticeable overhead. No idea whether this
patch's approach is a good one tho.

Thanks.

--
tejun

2018-07-04 06:57:59

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 7/2/18 11:21 PM, Tejun Heo wrote:
> Hello, Peter.
>
> On Tue, Jun 26, 2018 at 05:49:08PM +0200, Peter Zijlstra wrote:
>> Well, no, because the Changelog is incomprehensible and the patch
>> doesn't really have useful comments, so I'll have to reverse engineer
>> the entire thing, and I've just not had time for that.
>
> Just as an additional data point, we also sometimes see artifacts from
> cpu_adjust_time() in the form of per-task user or sys time getting
> stuck for some period (in extreme cases for over a minute) while the
> application isn't doing anything differently. We're telling the users
> that it's an inherent sampling artifact but it'd be nice to improve it
> if possible without adding noticeable overhead. No idea whether this
> patch's approach is a good one tho.

The patch has no noticeable overhead except the extra cputime fileds
added into task_struct. We've been running this patch on our servers
for months, looks good till now.

Thanks!

2018-07-05 10:47:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> tick-based whole utime is utime_0, tick-based whole stime
> is stime_0, scheduler time is rtime_0.

> For a long time, the process runs mainly in userspace with
> run-sleep patterns, and because two different clocks, it
> is possible to have the following condition:
> rtime_0 < utime_0 (as with little stime_0)

I don't follow... what?

Why are you, and why do you think it makes sense to, compare rtime_0
against utime_0 ?

The [us]time_0 are, per your earlier definition, ticks. They're not an
actual measure of time. Do not compare the two, that makes no bloody
sense.

2018-07-05 13:23:11

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>> tick-based whole utime is utime_0, tick-based whole stime
>> is stime_0, scheduler time is rtime_0.
>
>> For a long time, the process runs mainly in userspace with
>> run-sleep patterns, and because two different clocks, it
>> is possible to have the following condition:
>> rtime_0 < utime_0 (as with little stime_0)
>
> I don't follow... what?
>
> Why are you, and why do you think it makes sense to, compare rtime_0
> against utime_0 ?
>
> The [us]time_0 are, per your earlier definition, ticks. They're not an
> actual measure of time. Do not compare the two, that makes no bloody
> sense.
>

[us]time_0 is task_struct:utime{stime}, I cited directly from
cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
here to simple the following proof to help explain the problem we met.

2018-07-05 13:43:16

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
> > On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
> >> tick-based whole utime is utime_0, tick-based whole stime
> >> is stime_0, scheduler time is rtime_0.
> >
> >> For a long time, the process runs mainly in userspace with
> >> run-sleep patterns, and because two different clocks, it
> >> is possible to have the following condition:
> >> rtime_0 < utime_0 (as with little stime_0)
> >
> > I don't follow... what?
> >
> > Why are you, and why do you think it makes sense to, compare rtime_0
> > against utime_0 ?
> >
> > The [us]time_0 are, per your earlier definition, ticks. They're not an
> > actual measure of time. Do not compare the two, that makes no bloody
> > sense.
> >
>
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.

In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
actual durations. Yes, the happen to be accounted in multiples of
TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
durations, they are samples.

(we just happen to store them in a [ns] unit because for
VIRT_CPU_ACCOUNTING they are in fact durations)

If 'rtime < utime' is not a valid assumption to build a problem on for
!VIRT_CPU_ACCOUNTING.


So please try again, so far you're not making any sense.

2018-07-05 13:59:56

by xunlei

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On 7/5/18 9:42 PM, Peter Zijlstra wrote:
> On Thu, Jul 05, 2018 at 09:21:15PM +0800, Xunlei Pang wrote:
>> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>>>> tick-based whole utime is utime_0, tick-based whole stime
>>>> is stime_0, scheduler time is rtime_0.
>>>
>>>> For a long time, the process runs mainly in userspace with
>>>> run-sleep patterns, and because two different clocks, it
>>>> is possible to have the following condition:
>>>> rtime_0 < utime_0 (as with little stime_0)
>>>
>>> I don't follow... what?
>>>
>>> Why are you, and why do you think it makes sense to, compare rtime_0
>>> against utime_0 ?
>>>
>>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>>> actual measure of time. Do not compare the two, that makes no bloody
>>> sense.
>>>
>>
>> [us]time_0 is task_struct:utime{stime}, I cited directly from
>> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
>> here to simple the following proof to help explain the problem we met.
>
> In the !VIRT_CPU_ACCOUNTING case they (task_struct::[us]time) are not
> actual durations. Yes, the happen to be accounted in multiples of
> TICK_NSEC and thereby happen to carry a [ns] unit, but they are not
> durations, they are samples.
>
> (we just happen to store them in a [ns] unit because for
> VIRT_CPU_ACCOUNTING they are in fact durations)
>
> If 'rtime < utime' is not a valid assumption to build a problem on for
> !VIRT_CPU_ACCOUNTING.
>

It is rtime < utime + stime, that is the imprecise tick-based run time
may be larger than precise sum_exec_runtime scheduler-based run time, it
can happen with some frequent run-sleep patterns.

Because stime is usually very small, so it is possible to have rtime <
utime.

>
> So please try again, so far you're not making any sense.
>

I also had a reproducer to verify this patch, can attach it tomorrow.

2018-07-09 05:55:27

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

Hi Peter,

On 7/5/18 9:21 PM, Xunlei Pang wrote:
> On 7/5/18 6:46 PM, Peter Zijlstra wrote:
>> On Wed, Jun 27, 2018 at 08:22:42PM +0800, Xunlei Pang wrote:
>>> tick-based whole utime is utime_0, tick-based whole stime
>>> is stime_0, scheduler time is rtime_0.
>>
>>> For a long time, the process runs mainly in userspace with
>>> run-sleep patterns, and because two different clocks, it
>>> is possible to have the following condition:
>>> rtime_0 < utime_0 (as with little stime_0)
>>
>> I don't follow... what?
>>
>> Why are you, and why do you think it makes sense to, compare rtime_0
>> against utime_0 ?
>>
>> The [us]time_0 are, per your earlier definition, ticks. They're not an
>> actual measure of time. Do not compare the two, that makes no bloody
>> sense.
>>
>
> [us]time_0 is task_struct:utime{stime}, I cited directly from
> cputime_adjust(), both in nanoseconds. I assumed "rtime_0 < utime_0"
> here to simple the following proof to help explain the problem we met.
>

Please see the enclosure for the reproducer cputime_adjust.tgz
(process_top.sh, usr_sys.c):
gcc usr_sys.c -o usr_sys
Firstly, the function consume_sys() in usr_sys.c yields 100% sys
which can be verified as follows:
$ taskset -c 0 ./usr_sys 1
$ ./process_top.sh $(pidof usr_sys)
0.0 usr, 100.0 sys
0.0 usr, 100.0 sys


Tested on my local box on 4.17.0 by executing "taskset -c 0 ./usr_sys",
then executing "./process_top.sh $(pidof usr_sys)" to watch.
1) Before this patch
50.0 usr, 0.0 sys
50.0 usr, 1.0 sys
50.0 usr, 0.0 sys
50.0 usr, 0.0 sys
49.0 usr, 4.0 sys //switch to consume 100% sys, ignore this line
12.0 usr, 88.0 sys
11.0 usr, 89.0 sys
10.0 usr, 90.0 sys
10.0 usr, 90.0 sys
9.0 usr, 91.0 sys
8.0 usr, 91.0 sys

Obviously there were around 10% sys wrongly goes to usr

2) After this patch
50.0 usr, 0.0 sys
50.0 usr, 0.0 sys
50.0 usr, 0.0 sys
50.0 usr, 0.0 sys
11.0 usr, 76.0 sys //switch to consume 100% sys, ignore this line
1.0 usr, 100.0 sys
0.0 usr, 100.0 sys
1.0 usr, 100.0 sys
0.0 usr, 100.0 sys
0.0 usr, 100.0 sys

So it displayed the correct result as we expected after this patch.

Thanks,
Xunlei


Attachments:
cputime_adjust.tgz (1.46 kB)

2018-07-09 10:50:24

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
> Please see the enclosure for the reproducer cputime_adjust.tgz

No, I'm not going to reverse engineer something if you cannot even
explain what the problem is.

2018-07-09 10:50:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

On Thu, Jul 05, 2018 at 09:58:50PM +0800, xunlei wrote:
> It is rtime < utime + stime, that is the imprecise tick-based run time
> may be larger than precise sum_exec_runtime scheduler-based run time, it
> can happen with some frequent run-sleep patterns.

You're still comparing two values that should not be compared. utime +
stime is not a run-time. You _CANNOT_ compare to rtime and doing so
means you're still not getting it.


2018-07-09 15:01:58

by Xunlei Pang

[permalink] [raw]
Subject: Re: [PATCH] sched/cputime: Ensure correct utime and stime proportion

Hi Peter,

On 7/9/18 6:48 PM, Peter Zijlstra wrote:
> On Mon, Jul 09, 2018 at 01:52:38PM +0800, Xunlei Pang wrote:
>> Please see the enclosure for the reproducer cputime_adjust.tgz
>
> No, I'm not going to reverse engineer something if you cannot even
> explain what the problem is.
>

I rewrote the whole changelog and sent v2, please have a look.

Thanks,
Xunlei