2020-04-16 05:28:28

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH v3 0/6] arm64: add the time namespace support

Allocate the time namespace page among VVAR pages and add the logic
to handle faults on VVAR properly.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

v2: Code cleanups suggested by Vincenzo.
v3: use OPTIMIZER_HIDE_VAR() instead of inline assembly in
__arch_get_timens_vdso_data.

Cc: Vincenzo Frascino <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Dmitry Safonov <[email protected]>

v3 on github (if someone prefers `git pull` to `git am`):
https://github.com/avagin/linux-task-diag/tree/arm64/timens-v3

Andrei Vagin (6):
arm64/vdso: use the fault callback to map vvar pages
arm64/vdso: Zap vvar pages when switching to a time namespace
arm64/vdso: Add time napespace page
arm64/vdso: Handle faults on timens page
arm64/vdso: Restrict splitting VVAR VMA
arm64: enable time namespace support

arch/arm64/Kconfig | 1 +
.../include/asm/vdso/compat_gettimeofday.h | 11 ++
arch/arm64/include/asm/vdso/gettimeofday.h | 8 ++
arch/arm64/kernel/vdso.c | 134 ++++++++++++++++--
arch/arm64/kernel/vdso/vdso.lds.S | 3 +-
arch/arm64/kernel/vdso32/vdso.lds.S | 3 +-
include/vdso/datapage.h | 1 +
7 files changed, 147 insertions(+), 14 deletions(-)

--
2.24.1


2020-04-16 05:28:39

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 1/6] arm64/vdso: use the fault callback to map vvar pages

This is required to support time namespaces where a time namespace data
page is different for each namespace.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/kernel/vdso.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 354b11e27c07..290c36d74e03 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -114,28 +114,32 @@ static int __vdso_init(enum arch_vdso_type arch_index)
PAGE_SHIFT;

/* Allocate the vDSO pagelist, plus a page for the data. */
- vdso_pagelist = kcalloc(vdso_lookup[arch_index].vdso_pages + 1,
+ vdso_pagelist = kcalloc(vdso_lookup[arch_index].vdso_pages,
sizeof(struct page *),
GFP_KERNEL);
if (vdso_pagelist == NULL)
return -ENOMEM;

- /* Grab the vDSO data page. */
- vdso_pagelist[0] = phys_to_page(__pa_symbol(vdso_data));
-
-
/* Grab the vDSO code pages. */
pfn = sym_to_pfn(vdso_lookup[arch_index].vdso_code_start);

for (i = 0; i < vdso_lookup[arch_index].vdso_pages; i++)
- vdso_pagelist[i + 1] = pfn_to_page(pfn + i);
+ vdso_pagelist[i] = pfn_to_page(pfn + i);

- vdso_lookup[arch_index].dm->pages = &vdso_pagelist[0];
- vdso_lookup[arch_index].cm->pages = &vdso_pagelist[1];
+ vdso_lookup[arch_index].cm->pages = vdso_pagelist;

return 0;
}

+static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ if (vmf->pgoff == 0)
+ return vmf_insert_pfn(vma, vmf->address,
+ sym_to_pfn(vdso_data));
+ return VM_FAULT_SIGBUS;
+}
+
static int __setup_additional_pages(enum arch_vdso_type arch_index,
struct mm_struct *mm,
struct linux_binprm *bprm,
@@ -155,7 +159,7 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
}

ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
- VM_READ|VM_MAYREAD,
+ VM_READ|VM_MAYREAD|VM_PFNMAP,
vdso_lookup[arch_index].dm);
if (IS_ERR(ret))
goto up_fail;
@@ -215,6 +219,7 @@ static struct vm_special_mapping aarch32_vdso_spec[C_PAGES] = {
#ifdef CONFIG_COMPAT_VDSO
{
.name = "[vvar]",
+ .fault = vvar_fault,
},
{
.name = "[vdso]",
@@ -396,6 +401,7 @@ static int vdso_mremap(const struct vm_special_mapping *sm,
static struct vm_special_mapping vdso_spec[A_PAGES] __ro_after_init = {
{
.name = "[vvar]",
+ .fault = vvar_fault,
},
{
.name = "[vdso]",
--
2.24.1

2020-04-16 05:28:51

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 2/6] arm64/vdso: Zap vvar pages when switching to a time namespace

The VVAR page layout depends on whether a task belongs to the root or
non-root time namespace. Whenever a task changes its namespace, the VVAR
page tables are cleared and then they will be re-faulted with a
corresponding layout.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/kernel/vdso.c | 32 ++++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 290c36d74e03..6ac9cdeac5be 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -131,6 +131,38 @@ static int __vdso_init(enum arch_vdso_type arch_index)
return 0;
}

+#ifdef CONFIG_TIME_NS
+/*
+ * The vvar page layout depends on whether a task belongs to the root or
+ * non-root time namespace. Whenever a task changes its namespace, the VVAR
+ * page tables are cleared and then they will re-faulted with a
+ * corresponding layout.
+ * See also the comment near timens_setup_vdso_data() for details.
+ */
+int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
+{
+ struct mm_struct *mm = task->mm;
+ struct vm_area_struct *vma;
+
+ if (down_write_killable(&mm->mmap_sem))
+ return -EINTR;
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ unsigned long size = vma->vm_end - vma->vm_start;
+
+ if (vma_is_special_mapping(vma, vdso_lookup[ARM64_VDSO].dm))
+ zap_page_range(vma, vma->vm_start, size);
+#ifdef CONFIG_COMPAT_VDSO
+ if (vma_is_special_mapping(vma, vdso_lookup[ARM64_VDSO32].dm))
+ zap_page_range(vma, vma->vm_start, size);
+#endif
+ }
+
+ up_write(&mm->mmap_sem);
+ return 0;
+}
+#endif
+
static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
struct vm_area_struct *vma, struct vm_fault *vmf)
{
--
2.24.1

2020-04-16 05:31:05

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 4/6] arm64/vdso: Handle faults on timens page

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/kernel/vdso.c | 57 +++++++++++++++++++++++++++++++++++++---
1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index b3e7ce24e59b..fb32c6f76078 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -18,6 +18,7 @@
#include <linux/sched.h>
#include <linux/signal.h>
#include <linux/slab.h>
+#include <linux/time_namespace.h>
#include <linux/timekeeper_internal.h>
#include <linux/vmalloc.h>
#include <vdso/datapage.h>
@@ -175,15 +176,63 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
up_write(&mm->mmap_sem);
return 0;
}
+
+static struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+ if (likely(vma->vm_mm == current->mm))
+ return current->nsproxy->time_ns->vvar_page;
+
+ /*
+ * VM_PFNMAP | VM_IO protect .fault() handler from being called
+ * through interfaces like /proc/$pid/mem or
+ * process_vm_{readv,writev}() as long as there's no .access()
+ * in special_mapping_vmops().
+ * For more details check_vma_flags() and __access_remote_vm()
+ */
+
+ WARN(1, "vvar_page accessed remotely");
+
+ return NULL;
+}
+#else
+static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma)
+{
+ return NULL;
+}
#endif

static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
struct vm_area_struct *vma, struct vm_fault *vmf)
{
- if (vmf->pgoff == 0)
- return vmf_insert_pfn(vma, vmf->address,
- sym_to_pfn(vdso_data));
- return VM_FAULT_SIGBUS;
+ struct page *timens_page = find_timens_vvar_page(vma);
+ unsigned long pfn;
+
+ switch (vmf->pgoff) {
+ case VVAR_DATA_PAGE_OFFSET:
+ if (timens_page)
+ pfn = page_to_pfn(timens_page);
+ else
+ pfn = sym_to_pfn(vdso_data);
+ break;
+#ifdef CONFIG_TIME_NS
+ case VVAR_TIMENS_PAGE_OFFSET:
+ /*
+ * If a task belongs to a time namespace then a namespace
+ * specific VVAR is mapped with the VVAR_DATA_PAGE_OFFSET and
+ * the real VVAR page is mapped with the VVAR_TIMENS_PAGE_OFFSET
+ * offset.
+ * See also the comment near timens_setup_vdso_data().
+ */
+ if (!timens_page)
+ return VM_FAULT_SIGBUS;
+ pfn = sym_to_pfn(vdso_data);
+ break;
+#endif /* CONFIG_TIME_NS */
+ default:
+ return VM_FAULT_SIGBUS;
+ }
+
+ return vmf_insert_pfn(vma, vmf->address, pfn);
}

static int __setup_additional_pages(enum arch_vdso_type arch_index,
--
2.24.1

2020-04-16 05:31:25

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 5/6] arm64/vdso: Restrict splitting VVAR VMA

Forbid splitting VVAR VMA resulting in a stricter ABI and reducing the
amount of corner-cases to consider while working further on VDSO time
namespace support.

As the offset from timens to VVAR page is computed compile-time, the pages
in VVAR should stay together and not being partically mremap()'ed.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/kernel/vdso.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index fb32c6f76078..c003f7ee383a 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -235,6 +235,17 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
return vmf_insert_pfn(vma, vmf->address, pfn);
}

+static int vvar_mremap(const struct vm_special_mapping *sm,
+ struct vm_area_struct *new_vma)
+{
+ unsigned long new_size = new_vma->vm_end - new_vma->vm_start;
+
+ if (new_size != VVAR_NR_PAGES * PAGE_SIZE)
+ return -EINVAL;
+
+ return 0;
+}
+
static int __setup_additional_pages(enum arch_vdso_type arch_index,
struct mm_struct *mm,
struct linux_binprm *bprm,
@@ -315,6 +326,7 @@ static struct vm_special_mapping aarch32_vdso_spec[C_PAGES] = {
{
.name = "[vvar]",
.fault = vvar_fault,
+ .mremap = vvar_mremap,
},
{
.name = "[vdso]",
@@ -497,6 +509,7 @@ static struct vm_special_mapping vdso_spec[A_PAGES] __ro_after_init = {
{
.name = "[vvar]",
.fault = vvar_fault,
+ .mremap = vvar_mremap,
},
{
.name = "[vdso]",
--
2.24.1

2020-04-16 05:32:06

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 3/6] arm64/vdso: Add time napespace page

Allocate the time namespace page among VVAR pages. Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/include/asm/vdso.h | 6 ++++++
.../include/asm/vdso/compat_gettimeofday.h | 14 +++++++++++++
arch/arm64/include/asm/vdso/gettimeofday.h | 8 ++++++++
arch/arm64/kernel/vdso.c | 20 ++++++++++++++++---
arch/arm64/kernel/vdso/vdso.lds.S | 5 ++++-
arch/arm64/kernel/vdso32/vdso.lds.S | 5 ++++-
include/vdso/datapage.h | 1 +
7 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
index 07468428fd29..351c145d3808 100644
--- a/arch/arm64/include/asm/vdso.h
+++ b/arch/arm64/include/asm/vdso.h
@@ -12,6 +12,12 @@
*/
#define VDSO_LBASE 0x0

+#ifdef CONFIG_TIME_NS
+#define __VVAR_PAGES 2
+#else
+#define __VVAR_PAGES 1
+#endif
+
#ifndef __ASSEMBLY__

#include <generated/vdso-offsets.h>
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index b6907ae78e53..6ce9cdf5e08b 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -7,6 +7,8 @@

#ifndef __ASSEMBLY__

+#include <linux/compiler.h>
+
#include <asm/unistd.h>
#include <asm/errno.h>

@@ -152,6 +154,18 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
return ret;
}

+#ifdef CONFIG_TIME_NS
+static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+ const struct vdso_data *ret;
+
+ ret = _timens_data;
+ OPTIMIZER_HIDE_VAR(ret);
+
+ return ret;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index afba6ba332f8..cf39eae5eaaf 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -96,6 +96,14 @@ const struct vdso_data *__arch_get_vdso_data(void)
return _vdso_data;
}

+#ifdef CONFIG_TIME_NS
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+ return _timens_data;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 6ac9cdeac5be..b3e7ce24e59b 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -46,6 +46,14 @@ enum arch_vdso_type {
#define VDSO_TYPES (ARM64_VDSO + 1)
#endif /* CONFIG_COMPAT_VDSO */

+enum vvar_pages {
+ VVAR_DATA_PAGE_OFFSET = 0,
+#ifdef CONFIG_TIME_NS
+ VVAR_TIMENS_PAGE_OFFSET = 1,
+#endif /* CONFIG_TIME_NS */
+ VVAR_NR_PAGES = __VVAR_PAGES,
+};
+
struct __vdso_abi {
const char *name;
const char *vdso_code_start;
@@ -81,6 +89,12 @@ static union {
} vdso_data_store __page_aligned_data;
struct vdso_data *vdso_data = vdso_data_store.data;

+
+struct vdso_data *arch_get_vdso_data(void *vvar_page)
+{
+ return (struct vdso_data *)(vvar_page);
+}
+
static int __vdso_remap(enum arch_vdso_type arch_index,
const struct vm_special_mapping *sm,
struct vm_area_struct *new_vma)
@@ -182,7 +196,7 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,

vdso_text_len = vdso_lookup[arch_index].vdso_pages << PAGE_SHIFT;
/* Be sure to map the data page */
- vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+ vdso_mapping_len = vdso_text_len + VVAR_NR_PAGES * PAGE_SIZE;

vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
@@ -190,13 +204,13 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
goto up_fail;
}

- ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+ ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
VM_READ|VM_MAYREAD|VM_PFNMAP,
vdso_lookup[arch_index].dm);
if (IS_ERR(ret))
goto up_fail;

- vdso_base += PAGE_SIZE;
+ vdso_base += VVAR_NR_PAGES * PAGE_SIZE;
mm->context.vdso = (void *)vdso_base;
ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
VM_READ|VM_EXEC|
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..d808ad31e01f 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(aarch64)

SECTIONS
{
- PROVIDE(_vdso_data = . - PAGE_SIZE);
+ PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+ PROVIDE(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
. = VDSO_LBASE + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S
index a3944927eaeb..06cc60a9630f 100644
--- a/arch/arm64/kernel/vdso32/vdso.lds.S
+++ b/arch/arm64/kernel/vdso32/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(arm)

SECTIONS
{
- PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE);
+ PROVIDE_HIDDEN(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+ PROVIDE_HIDDEN(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
. = VDSO_LBASE + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 5cbc9fcbfd45..2022e8c653c1 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -109,6 +109,7 @@ struct vdso_data {
* relocation, and this is what we need.
*/
extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden")));
+extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden")));

/*
* The generic vDSO implementation requires that gettimeofday.h
--
2.24.1

2020-04-16 05:32:20

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 6/6] arm64: enable time namespace support

CONFIG_TIME_NS is dependes on GENERIC_VDSO_TIME_NS.

Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 40fb05d96c60..68619faf0838 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -110,6 +110,7 @@ config ARM64
select GENERIC_STRNLEN_USER
select GENERIC_TIME_VSYSCALL
select GENERIC_GETTIMEOFDAY
+ select GENERIC_VDSO_TIME_NS
select HANDLE_DOMAIN_IRQ
select HARDIRQS_SW_RESEND
select HAVE_PCI
--
2.24.1

2020-04-16 17:23:11

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 3/6] arm64/vdso: Add time napespace page

Hi Andrei,

On Wed, Apr 15, 2020 at 10:26:15PM -0700, Andrei Vagin wrote:
> diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
> index 07468428fd29..351c145d3808 100644
> --- a/arch/arm64/include/asm/vdso.h
> +++ b/arch/arm64/include/asm/vdso.h
> @@ -12,6 +12,12 @@
> */
> #define VDSO_LBASE 0x0
>
> +#ifdef CONFIG_TIME_NS
> +#define __VVAR_PAGES 2
> +#else
> +#define __VVAR_PAGES 1
> +#endif
> +
> #ifndef __ASSEMBLY__

> +#ifdef CONFIG_TIME_NS
> +static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> +{
> + const struct vdso_data *ret;
> +
> + ret = _timens_data;
> + OPTIMIZER_HIDE_VAR(ret);
> +
> + return ret;
> +}
> +#endif

Sorry for the confusion here, but please either:

* Add a preparatory patch making __arch_get_vdso_data() use
OPTIMIZER_HIDE_VAR(), and use OPTIMIZER_HIDE_VAR() here.

* Use the same assembly as __arch_get_vdso_data() currently does.

... and either way add a comment here:

/* See __arch_get_vdso_data() */

... so taht the rationale is obvious.

[...]

> +enum vvar_pages {
> + VVAR_DATA_PAGE_OFFSET = 0,
> +#ifdef CONFIG_TIME_NS
> + VVAR_TIMENS_PAGE_OFFSET = 1,
> +#endif /* CONFIG_TIME_NS */
> + VVAR_NR_PAGES = __VVAR_PAGES,
> +};

Pet peeve, but we don't need the initializers here, as enums start from
zero. The last element shouldn't have a trailing comma as we don't
expect to add elements after it in future.

Rather than assigning to VVAR_NR_PAGES, it'd be better to use a
BUILD_BUG_ON() to verify that it is the number we expect:

enum vvar_pages {
VVAR_DATA_PAGE,
#ifdef CONFIG_TIME_NS
VVAR_TIMENS_PAGE,
#endif
VVAR_NR_PAGES
};

BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);

Thanks,
Mark.

2020-04-17 04:15:36

by Andrei Vagin

[permalink] [raw]
Subject: [PATCH 3/6 v2] arm64/vdso: Add time namespace page

Allocate the time namespace page among VVAR pages. Provide
__arch_get_timens_vdso_data() helper for VDSO code to get the
code-relative position of VVARs on that special page.

If a task belongs to a time namespace then the VVAR page which contains
the system wide VDSO data is replaced with a namespace specific page
which has the same layout as the VVAR page. That page has vdso_data->seq
set to 1 to enforce the slow path and vdso_data->clock_mode set to
VCLOCK_TIMENS to enforce the time namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the VDSO data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

Cc: Mark Rutland <[email protected]>
Reviewed-by: Vincenzo Frascino <[email protected]>
Signed-off-by: Andrei Vagin <[email protected]>
---

v2: cleaups suggested by Mark.

arch/arm64/include/asm/vdso.h | 6 +++++
.../include/asm/vdso/compat_gettimeofday.h | 12 ++++++++++
arch/arm64/include/asm/vdso/gettimeofday.h | 8 +++++++
arch/arm64/kernel/vdso.c | 22 ++++++++++++++++---
arch/arm64/kernel/vdso/vdso.lds.S | 5 ++++-
arch/arm64/kernel/vdso32/vdso.lds.S | 5 ++++-
include/vdso/datapage.h | 1 +
7 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
index 07468428fd29..351c145d3808 100644
--- a/arch/arm64/include/asm/vdso.h
+++ b/arch/arm64/include/asm/vdso.h
@@ -12,6 +12,12 @@
*/
#define VDSO_LBASE 0x0

+#ifdef CONFIG_TIME_NS
+#define __VVAR_PAGES 2
+#else
+#define __VVAR_PAGES 1
+#endif
+
#ifndef __ASSEMBLY__

#include <generated/vdso-offsets.h>
diff --git a/arch/arm64/include/asm/vdso/compat_gettimeofday.h b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
index b6907ae78e53..b7c549d46d18 100644
--- a/arch/arm64/include/asm/vdso/compat_gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/compat_gettimeofday.h
@@ -152,6 +152,18 @@ static __always_inline const struct vdso_data *__arch_get_vdso_data(void)
return ret;
}

+#ifdef CONFIG_TIME_NS
+static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+ const struct vdso_data *ret;
+
+ /* See __arch_get_vdso_data(). */
+ asm volatile("mov %0, %1" : "=r"(ret) : "r"(_timens_data));
+
+ return ret;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/include/asm/vdso/gettimeofday.h b/arch/arm64/include/asm/vdso/gettimeofday.h
index afba6ba332f8..cf39eae5eaaf 100644
--- a/arch/arm64/include/asm/vdso/gettimeofday.h
+++ b/arch/arm64/include/asm/vdso/gettimeofday.h
@@ -96,6 +96,14 @@ const struct vdso_data *__arch_get_vdso_data(void)
return _vdso_data;
}

+#ifdef CONFIG_TIME_NS
+static __always_inline
+const struct vdso_data *__arch_get_timens_vdso_data(void)
+{
+ return _timens_data;
+}
+#endif
+
#endif /* !__ASSEMBLY__ */

#endif /* __ASM_VDSO_GETTIMEOFDAY_H */
diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 6ac9cdeac5be..ccac00919d89 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -46,6 +46,14 @@ enum arch_vdso_type {
#define VDSO_TYPES (ARM64_VDSO + 1)
#endif /* CONFIG_COMPAT_VDSO */

+enum vvar_pages {
+ VVAR_DATA_PAGE_OFFSET,
+#ifdef CONFIG_TIME_NS
+ VVAR_TIMENS_PAGE_OFFSET,
+#endif /* CONFIG_TIME_NS */
+ VVAR_NR_PAGES,
+};
+
struct __vdso_abi {
const char *name;
const char *vdso_code_start;
@@ -81,6 +89,12 @@ static union {
} vdso_data_store __page_aligned_data;
struct vdso_data *vdso_data = vdso_data_store.data;

+
+struct vdso_data *arch_get_vdso_data(void *vvar_page)
+{
+ return (struct vdso_data *)(vvar_page);
+}
+
static int __vdso_remap(enum arch_vdso_type arch_index,
const struct vm_special_mapping *sm,
struct vm_area_struct *new_vma)
@@ -180,9 +194,11 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
unsigned long vdso_base, vdso_text_len, vdso_mapping_len;
void *ret;

+ BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);
+
vdso_text_len = vdso_lookup[arch_index].vdso_pages << PAGE_SHIFT;
/* Be sure to map the data page */
- vdso_mapping_len = vdso_text_len + PAGE_SIZE;
+ vdso_mapping_len = vdso_text_len + VVAR_NR_PAGES * PAGE_SIZE;

vdso_base = get_unmapped_area(NULL, 0, vdso_mapping_len, 0, 0);
if (IS_ERR_VALUE(vdso_base)) {
@@ -190,13 +206,13 @@ static int __setup_additional_pages(enum arch_vdso_type arch_index,
goto up_fail;
}

- ret = _install_special_mapping(mm, vdso_base, PAGE_SIZE,
+ ret = _install_special_mapping(mm, vdso_base, VVAR_NR_PAGES * PAGE_SIZE,
VM_READ|VM_MAYREAD|VM_PFNMAP,
vdso_lookup[arch_index].dm);
if (IS_ERR(ret))
goto up_fail;

- vdso_base += PAGE_SIZE;
+ vdso_base += VVAR_NR_PAGES * PAGE_SIZE;
mm->context.vdso = (void *)vdso_base;
ret = _install_special_mapping(mm, vdso_base, vdso_text_len,
VM_READ|VM_EXEC|
diff --git a/arch/arm64/kernel/vdso/vdso.lds.S b/arch/arm64/kernel/vdso/vdso.lds.S
index 7ad2d3a0cd48..d808ad31e01f 100644
--- a/arch/arm64/kernel/vdso/vdso.lds.S
+++ b/arch/arm64/kernel/vdso/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(aarch64)

SECTIONS
{
- PROVIDE(_vdso_data = . - PAGE_SIZE);
+ PROVIDE(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+ PROVIDE(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
. = VDSO_LBASE + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
diff --git a/arch/arm64/kernel/vdso32/vdso.lds.S b/arch/arm64/kernel/vdso32/vdso.lds.S
index a3944927eaeb..06cc60a9630f 100644
--- a/arch/arm64/kernel/vdso32/vdso.lds.S
+++ b/arch/arm64/kernel/vdso32/vdso.lds.S
@@ -17,7 +17,10 @@ OUTPUT_ARCH(arm)

SECTIONS
{
- PROVIDE_HIDDEN(_vdso_data = . - PAGE_SIZE);
+ PROVIDE_HIDDEN(_vdso_data = . - __VVAR_PAGES * PAGE_SIZE);
+#ifdef CONFIG_TIME_NS
+ PROVIDE_HIDDEN(_timens_data = _vdso_data + PAGE_SIZE);
+#endif
. = VDSO_LBASE + SIZEOF_HEADERS;

.hash : { *(.hash) } :text
diff --git a/include/vdso/datapage.h b/include/vdso/datapage.h
index 5cbc9fcbfd45..2022e8c653c1 100644
--- a/include/vdso/datapage.h
+++ b/include/vdso/datapage.h
@@ -109,6 +109,7 @@ struct vdso_data {
* relocation, and this is what we need.
*/
extern struct vdso_data _vdso_data[CS_BASES] __attribute__((visibility("hidden")));
+extern struct vdso_data _timens_data[CS_BASES] __attribute__((visibility("hidden")));

/*
* The generic vDSO implementation requires that gettimeofday.h
--
2.24.1

2020-04-23 05:27:56

by Andrei Vagin

[permalink] [raw]
Subject: Re: [PATCH 3/6] arm64/vdso: Add time napespace page

On Thu, Apr 16, 2020 at 11:45:27AM +0100, Mark Rutland wrote:
> Hi Andrei,
>
> On Wed, Apr 15, 2020 at 10:26:15PM -0700, Andrei Vagin wrote:
> > diff --git a/arch/arm64/include/asm/vdso.h b/arch/arm64/include/asm/vdso.h
> > index 07468428fd29..351c145d3808 100644
> > --- a/arch/arm64/include/asm/vdso.h
> > +++ b/arch/arm64/include/asm/vdso.h
...
> > +#ifdef CONFIG_TIME_NS
> > +static __always_inline const struct vdso_data *__arch_get_timens_vdso_data(void)
> > +{
> > + const struct vdso_data *ret;
> > +
> > + ret = _timens_data;
> > + OPTIMIZER_HIDE_VAR(ret);
> > +
> > + return ret;
> > +}
> > +#endif
>
> Sorry for the confusion here, but please either:
>
> * Add a preparatory patch making __arch_get_vdso_data() use
> OPTIMIZER_HIDE_VAR(), and use OPTIMIZER_HIDE_VAR() here.
>
> * Use the same assembly as __arch_get_vdso_data() currently does.
>
> ... and either way add a comment here:
>
> /* See __arch_get_vdso_data() */
>
> ... so taht the rationale is obvious.
>
> [...]
>
> > +enum vvar_pages {
> > + VVAR_DATA_PAGE_OFFSET = 0,
> > +#ifdef CONFIG_TIME_NS
> > + VVAR_TIMENS_PAGE_OFFSET = 1,
> > +#endif /* CONFIG_TIME_NS */
> > + VVAR_NR_PAGES = __VVAR_PAGES,
> > +};
>
> Pet peeve, but we don't need the initializers here, as enums start from
> zero. The last element shouldn't have a trailing comma as we don't
> expect to add elements after it in future.
>
> Rather than assigning to VVAR_NR_PAGES, it'd be better to use a
> BUILD_BUG_ON() to verify that it is the number we expect:
>
> enum vvar_pages {
> VVAR_DATA_PAGE,
> #ifdef CONFIG_TIME_NS
> VVAR_TIMENS_PAGE,
> #endif
> VVAR_NR_PAGES
> };
>
> BUILD_BUG_ON(VVAR_NR_PAGES != __VVAR_PAGES);

Hi Mark,

Thank you for the review. I have sent a fixed version of this patch in
replay to the origin patch.

Thanks,
Andrei

2020-05-20 12:03:35

by Vincenzo Frascino

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] arm64: add the time namespace support

Hi Andrei,

On 4/16/20 6:26 AM, Andrei Vagin wrote:
> Allocate the time namespace page among VVAR pages and add the logic
> to handle faults on VVAR properly.
>
> If a task belongs to a time namespace then the VVAR page which contains
> the system wide VDSO data is replaced with a namespace specific page
> which has the same layout as the VVAR page. That page has vdso_data->seq
> set to 1 to enforce the slow path and vdso_data->clock_mode set to
> VCLOCK_TIMENS to enforce the time namespace handling path.
>
> The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
> update of the VDSO data is in progress, is not really affecting regular
> tasks which are not part of a time namespace as the task is spin waiting
> for the update to finish and vdso_data->seq to become even again.
>
> If a time namespace task hits that code path, it invokes the corresponding
> time getter function which retrieves the real VVAR page, reads host time
> and then adds the offset for the requested clock which is stored in the
> special VVAR page.
>
> v2: Code cleanups suggested by Vincenzo.
> v3: use OPTIMIZER_HIDE_VAR() instead of inline assembly in
> __arch_get_timens_vdso_data.
>

Nit: If you re-post, I would remove the OPTIMIZER_HIDE_VAR() reference because
it does not reflect the current status of the patches.

I tested it again with your latest change in the test code and it works for me
(thank you for sending a patch for the test as well).

With this:

Reviewed-by: Vincenzo Frascino <[email protected]>

> Cc: Vincenzo Frascino <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Dmitry Safonov <[email protected]>
>
> v3 on github (if someone prefers `git pull` to `git am`):
> https://github.com/avagin/linux-task-diag/tree/arm64/timens-v3
>
> Andrei Vagin (6):
> arm64/vdso: use the fault callback to map vvar pages
> arm64/vdso: Zap vvar pages when switching to a time namespace
> arm64/vdso: Add time napespace page
> arm64/vdso: Handle faults on timens page
> arm64/vdso: Restrict splitting VVAR VMA
> arm64: enable time namespace support
>
> arch/arm64/Kconfig | 1 +
> .../include/asm/vdso/compat_gettimeofday.h | 11 ++
> arch/arm64/include/asm/vdso/gettimeofday.h | 8 ++
> arch/arm64/kernel/vdso.c | 134 ++++++++++++++++--
> arch/arm64/kernel/vdso/vdso.lds.S | 3 +-
> arch/arm64/kernel/vdso32/vdso.lds.S | 3 +-
> include/vdso/datapage.h | 1 +
> 7 files changed, 147 insertions(+), 14 deletions(-)
>

--
Regards,
Vincenzo