2020-10-20 07:35:29

by Michael Weiß

[permalink] [raw]
Subject: [PATCH v4 0/3] time namespace aware system boot time

Time namespaces make it possible to virtualize time inside of
containers, e.g., it is feasible to reset the uptime of a container
to zero by setting the time namespace offset for boottime to the
negated current value of the CLOCK_BOOTTIME.

However, the boot time stamp provided by getboottime64() does not
take care of time namespaces. The resulting boot time stamp 'btime'
provided by /proc/stat does not show a plausible time stamp inside
the time namespace of a container.

We address this by shifting the value returned by getboottime64()
by subtracting the boottime offset of the time namespace.
(A selftest to check the expected /proc/stat 'btime' inside the
namespace is provided.)

Further, to avoid to show processes as time travelers inside of the
time namespace the boottime offset then needs to be added to the
start_boottime provided by the task_struct.

v4 Changes:
Avoid type conversions back and forth between timespec64 and ktime_t
in 'proc/stat.c' as suggested by Andrei.
Introduced timens_sub_boottime() in 'time_namespace.h' to provide
better coder readability/consistency.

v3 Changes:
leave getboottime64() unchanged and shift the boot timestamp in
'fs/proc/stat.c' as result of the discussion with Andrei and Thomas.

v2 Changes:
Fixed compile errors with TIME_NS not set in config

Michael Weiß (3):
timens: additional helper functions for boottime offset handling
fs/proc: apply the time namespace offset to /proc/stat btime
selftests/timens: added selftest for /proc/stat btime

fs/proc/array.c | 6 ++-
fs/proc/stat.c | 3 ++
include/linux/time_namespace.h | 22 ++++++++++
tools/testing/selftests/timens/procfs.c | 58 ++++++++++++++++++++++++-
4 files changed, 86 insertions(+), 3 deletions(-)

--
2.20.1


2020-10-20 07:35:46

by Michael Weiß

[permalink] [raw]
Subject: [PATCH v4 1/3] timens: additional helper functions for boottime offset handling

Provide functions for time_namespace to subtract the boottime offset
from a timespec64 as well as to apply the boottime offset to u64 types in
nanoseconds.

Signed-off-by: Michael Weiß <[email protected]>
---
include/linux/time_namespace.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 5b6031385db0..68770ac9ba89 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -77,6 +77,20 @@ static inline void timens_add_boottime(struct timespec64 *ts)
*ts = timespec64_add(*ts, ns_offsets->boottime);
}

+static inline u64 timens_add_boottime_ns(u64 nsec)
+{
+ struct timens_offsets *ns_offsets = &current->nsproxy->time_ns->offsets;
+
+ return nsec + timespec64_to_ns(&ns_offsets->boottime);
+}
+
+static inline void timens_sub_boottime(struct timespec64 *ts)
+{
+ struct timens_offsets *ns_offsets = &current->nsproxy->time_ns->offsets;
+
+ *ts = timespec64_sub(*ts, ns_offsets->boottime);
+}
+
ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
struct timens_offsets *offsets);

@@ -130,6 +144,14 @@ static inline int timens_on_fork(struct nsproxy *nsproxy,

static inline void timens_add_monotonic(struct timespec64 *ts) { }
static inline void timens_add_boottime(struct timespec64 *ts) { }
+
+static inline u64 timens_add_boottime_ns(u64 nsec)
+{
+ return nsec;
+}
+
+static inline void timens_sub_boottime(struct timespec64 *ts) { }
+
static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
{
return tim;
--
2.20.1

2020-10-20 07:36:58

by Michael Weiß

[permalink] [raw]
Subject: [PATCH v4 2/3] fs/proc: apply the time namespace offset to /proc/stat btime

'/proc/stat' provides the field 'btime' which states the time stamp of
system boot in seconds. In case of time namespaces, the offset to the
boot time stamp was not applied earlier. However, in container
runtimes which utilize time namespaces to virtualize boottime of a
container, this leaks information about the host system boot time.

Therefore, we make procfs to virtualize also the btime field by
subtracting the offset of the timens boottime from 'btime' before
printing the stats.

Since start_boottime of processes are seconds since boottime and the
boottime stamp is now shifted according to the timens offset, the
offset of the time namespace also needs to be applied before the
process stats are given to userspace.

This avoids that processes shown, e.g., by 'ps' appear as time
travelers in the corresponding time namespace.

Signed-off-by: Michael Weiß <[email protected]>
---
fs/proc/array.c | 6 ++++--
fs/proc/stat.c | 3 +++
2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 65ec2029fa80..277f654f289e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -56,6 +56,7 @@
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/time.h>
+#include <linux/time_namespace.h>
#include <linux/kernel.h>
#include <linux/kernel_stat.h>
#include <linux/tty.h>
@@ -533,8 +534,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
priority = task_prio(task);
nice = task_nice(task);

- /* convert nsec -> ticks */
- start_time = nsec_to_clock_t(task->start_boottime);
+ /* apply timens offset for boottime and convert nsec -> ticks */
+ start_time =
+ nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime));

seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
seq_puts(m, " (");
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 46b3293015fe..9df128ea9417 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -10,6 +10,7 @@
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <linux/time.h>
+#include <linux/time_namespace.h>
#include <linux/irqnr.h>
#include <linux/sched/cputime.h>
#include <linux/tick.h>
@@ -118,6 +119,8 @@ static int show_stat(struct seq_file *p, void *v)
irq = softirq = steal = 0;
guest = guest_nice = 0;
getboottime64(&boottime);
+ /* shift boot timestamp according to the timens offset */
+ timens_sub_boottime(&boottime);

for_each_possible_cpu(i) {
struct kernel_cpustat kcpustat;
--
2.20.1

2020-10-20 07:37:28

by Michael Weiß

[permalink] [raw]
Subject: [PATCH v4 3/3] selftests/timens: added selftest for /proc/stat btime

Test that btime value of /proc/stat is as expected in the time namespace
using a simple parser to get btime from /proc/stat.

Signed-off-by: Michael Weiß <[email protected]>
---
tools/testing/selftests/timens/procfs.c | 58 ++++++++++++++++++++++++-
1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c
index 7f14f0fdac84..f2519154208a 100644
--- a/tools/testing/selftests/timens/procfs.c
+++ b/tools/testing/selftests/timens/procfs.c
@@ -93,6 +93,33 @@ static int read_proc_uptime(struct timespec *uptime)
return 0;
}

+static int read_proc_stat_btime(unsigned long long *boottime_sec)
+{
+ FILE *proc;
+ char line_buf[2048];
+
+ proc = fopen("/proc/stat", "r");
+ if (proc == NULL) {
+ pr_perror("Unable to open /proc/stat");
+ return -1;
+ }
+
+ while (fgets(line_buf, 2048, proc)) {
+ if (sscanf(line_buf, "btime %llu", boottime_sec) != 1)
+ continue;
+ fclose(proc);
+ return 0;
+ }
+ if (errno) {
+ pr_perror("fscanf");
+ fclose(proc);
+ return -errno;
+ }
+ pr_err("failed to parse /proc/stat");
+ fclose(proc);
+ return -1;
+}
+
static int check_uptime(void)
{
struct timespec uptime_new, uptime_old;
@@ -123,18 +150,47 @@ static int check_uptime(void)
return 0;
}

+static int check_stat_btime(void)
+{
+ unsigned long long btime_new, btime_old;
+ unsigned long long btime_expected;
+
+ if (switch_ns(parent_ns))
+ return pr_err("switch_ns(%d)", parent_ns);
+
+ if (read_proc_stat_btime(&btime_old))
+ return 1;
+
+ if (switch_ns(child_ns))
+ return pr_err("switch_ns(%d)", child_ns);
+
+ if (read_proc_stat_btime(&btime_new))
+ return 1;
+
+ btime_expected = btime_old - TEN_DAYS_IN_SEC;
+ if (btime_new != btime_expected) {
+ pr_fail("btime in /proc/stat: old %llu, new %llu [%llu]",
+ btime_old, btime_new, btime_expected);
+ return 1;
+ }
+
+ ksft_test_result_pass("Passed for /proc/stat btime\n");
+ return 0;
+}
+
int main(int argc, char *argv[])
{
int ret = 0;

nscheck();

- ksft_set_plan(1);
+ ksft_set_plan(2);

if (init_namespaces())
return 1;

ret |= check_uptime();
+ ret |= check_stat_btime();

if (ret)
ksft_exit_fail();
--
2.20.1

2020-10-22 09:08:50

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] time namespace aware system boot time

On Mon, Oct 19, 2020 at 09:52:54PM +0200, Michael Weiß wrote:
> Time namespaces make it possible to virtualize time inside of
> containers, e.g., it is feasible to reset the uptime of a container
> to zero by setting the time namespace offset for boottime to the
> negated current value of the CLOCK_BOOTTIME.
>
> However, the boot time stamp provided by getboottime64() does not
> take care of time namespaces. The resulting boot time stamp 'btime'
> provided by /proc/stat does not show a plausible time stamp inside
> the time namespace of a container.
>
> We address this by shifting the value returned by getboottime64()
> by subtracting the boottime offset of the time namespace.
> (A selftest to check the expected /proc/stat 'btime' inside the
> namespace is provided.)
>
> Further, to avoid to show processes as time travelers inside of the
> time namespace the boottime offset then needs to be added to the
> start_boottime provided by the task_struct.
>
> v4 Changes:
> Avoid type conversions back and forth between timespec64 and ktime_t
> in 'proc/stat.c' as suggested by Andrei.
> Introduced timens_sub_boottime() in 'time_namespace.h' to provide
> better coder readability/consistency.
>

Reviewed-by: Andrei Vagin <[email protected]>

Thanks,
Andrei

2020-10-26 10:42:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 0/3] time namespace aware system boot time

On Mon, Oct 19 2020 at 21:52, Michael Weiß wrote:
> Michael Weiß (3):
> timens: additional helper functions for boottime offset handling
> fs/proc: apply the time namespace offset to /proc/stat btime
> selftests/timens: added selftest for /proc/stat btime
>
> fs/proc/array.c | 6 ++-
> fs/proc/stat.c | 3 ++
> include/linux/time_namespace.h | 22 ++++++++++
> tools/testing/selftests/timens/procfs.c | 58 ++++++++++++++++++++++++-
> 4 files changed, 86 insertions(+), 3 deletions(-)

Acked-by: Thomas Gleixner <[email protected]>

2020-10-26 10:44:01

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] fs/proc: apply the time namespace offset to /proc/stat btime

On Mon, Oct 19 2020 at 21:52, Michael Weiß wrote:

> '/proc/stat' provides the field 'btime' which states the time stamp of
> system boot in seconds. In case of time namespaces, the offset to the
> boot time stamp was not applied earlier. However, in container
> runtimes which utilize time namespaces to virtualize boottime of a
> container, this leaks information about the host system boot time.

Not sure if that qualifies as a leak. The point is that it confuses the
tasks which are in a different time universe.

Thanks,

tglx

2020-10-27 14:43:11

by Michael Weiß

[permalink] [raw]
Subject: Re: [PATCH v4 2/3] fs/proc: apply the time namespace offset to /proc/stat btime

Thomas,

On 26.10.20 11:28, Thomas Gleixner wrote:
> On Mon, Oct 19 2020 at 21:52, Michael Weiß wrote:
>
>> '/proc/stat' provides the field 'btime' which states the time stamp of
>> system boot in seconds. In case of time namespaces, the offset to the
>> boot time stamp was not applied earlier. However, in container
>> runtimes which utilize time namespaces to virtualize boottime of a
>> container, this leaks information about the host system boot time.
>
> Not sure if that qualifies as a leak. The point is that it confuses the
> tasks which are in a different time universe.
>
From a container isolation perspective this could be interpreted as an
information leak, if you want to hide the host systems boot time form
users inside of the container.
However, since proc is not totally isolated concerning host system
information inside of namespaces, I agree to your point.

I will take this into account when preparing v5.
When amending the tags I will also updated this passage and remove the
leak statement.

Thanks,
Michael