2019-05-03 18:19:16

by Joel Savitz

[permalink] [raw]
Subject: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace

In the mainline kernel, there is no quick mechanism to get the virtual
memory size of the current process from userspace.

Despite the current state of affairs, this information is available to the
user through several means, one being a linear search of the entire address
space. This is an inefficient use of cpu cycles.

A component of the libhugetlb kernel test does exactly this, and as
systems' address spaces increase beyond 32-bits, this method becomes
exceedingly tedious.

For example, on a ppc64le system with a 47-bit address space, the linear
search causes the test to hang for some unknown amount of time. I
couldn't give you an exact number because I just ran it for about 10-20
minutes and went to go do something else, probably to get coffee or
something, and when I came back, I just killed the test and patched it
to use this new mechanism. I re-ran my new version of the test using a
kernel with this patch, and of course it passed through the previously
bottlenecking codepath nearly instantaneously.

As such, I propose that the prctl syscall be extended to include the
option to retrieve TASK_SIZE from the kernel.

This patch will allow us to upgrade an O(n) codepath to O(1) in an
architecture-independent manner, and provide a mechanism for future
generations to do the same.

Changes from v2:
We now account for the case of 32-bit compat userspace on a 64-bit kernel
More detail about the nature of TASK_SIZE in documentation

Joel Savitz(2):
sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
prctl.2: Document the new PR_GET_TASK_SIZE option

include/uapi/linux/prctl.h | 3 +++
kernel/sys.c | 23 +++++++++++++++++++++++
2 files changed, 26 insertions(+)

man2/prctl.2 | 10 ++++++++++
1 file changed, 10 insertions(+)
--
2.18.1


2019-05-03 20:29:36

by Joel Savitz

[permalink] [raw]
Subject: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
copy the value of TASK_SIZE to the userspace address in arg2.

It is important that we account for the case of the userspace task
running in 32-bit compat mode on a 64-bit kernel. As such, we must be
careful to copy the correct number of bytes to userspace to avoid stack
corruption.

Suggested-by: Yuri Norov <[email protected]>
Suggested-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Joel Savitz <[email protected]>
---
include/uapi/linux/prctl.h | 3 +++
kernel/sys.c | 23 +++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 094bb03b9cc2..2c261c461952 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -229,4 +229,7 @@ struct prctl_mm_map {
# define PR_PAC_APDBKEY (1UL << 3)
# define PR_PAC_APGAKEY (1UL << 4)

+/* Get the process virtual memory size (i.e. the highest usable VM address) */
+#define PR_GET_TASK_SIZE 55
+
#endif /* _LINUX_PRCTL_H */
diff --git a/kernel/sys.c b/kernel/sys.c
index 12df0e5434b8..709584400070 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
return 1;
}

+static int prctl_get_tasksize(void __user *uaddr)
+{
+ unsigned long current_task_size, current_word_size;
+
+ current_task_size = TASK_SIZE;
+ current_word_size = sizeof(unsigned long);
+
+#ifdef CONFIG_64BIT
+ /* On 64-bit architecture, we must check whether the current thread
+ * is running in 32-bit compat mode. If it is, we can simply cut
+ * the size in half. This avoids corruption of the userspace stack.
+ */
+ if (test_thread_flag(TIF_ADDR32))
+ current_word_size >>= 1;
+#endif
+
+ return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
+}
+
int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
{
return -EINVAL;
@@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
return -EINVAL;
error = PAC_RESET_KEYS(me, arg2);
break;
+ case PR_GET_TASK_SIZE:
+ error = prctl_get_tasksize((void *)arg2);
+ break;
default:
error = -EINVAL;
break;
--
2.18.1

2019-05-03 20:31:12

by Joel Savitz

[permalink] [raw]
Subject: [PATCH v3 2/2] prctl.2: Document the new PR_GET_TASK_SIZE option

Add a short explanation of the new PR_GET_TASK_SIZE option for the benefit
of future generations.

Suggested-by: David Laight <[email protected]>
Signed-off-by: Joel Savitz <[email protected]>
---
man2/prctl.2 | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/man2/prctl.2 b/man2/prctl.2
index 06d8e13c7..cae582726 100644
--- a/man2/prctl.2
+++ b/man2/prctl.2
@@ -49,6 +49,7 @@
.\" 2013-01-10 Kees Cook, document PR_SET_PTRACER
.\" 2012-02-04 Michael Kerrisk, document PR_{SET,GET}_CHILD_SUBREAPER
.\" 2014-11-10 Dave Hansen, document PR_MPX_{EN,DIS}ABLE_MANAGEMENT
+.\" 2019-05-03 Joel Savitz, document PR_GET_TASK_SIZE
.\"
.\"
.TH PRCTL 2 2019-03-06 "Linux" "Linux Programmer's Manual"
@@ -1375,6 +1376,15 @@ system call on Tru64).
for information on versions and architectures)
Return unaligned access control bits, in the location pointed to by
.IR "(unsigned int\ *) arg2" .
+.TP
+.B PR_GET_TASK_SIZE
+Copy the value of TASK_SIZE to the userspace address in
+.IR "(unsigned long\ *) arg2" .
+This value represents the highest virtual address available
+to the current process. Return
+.B EFAULT
+if this operation fails.
+
.SH RETURN VALUE
On success,
.BR PR_GET_DUMPABLE ,
--
2.18.1

2019-05-03 21:18:58

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace

On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> In the mainline kernel, there is no quick mechanism to get the virtual
> memory size of the current process from userspace.
>
> Despite the current state of affairs, this information is available to the
> user through several means, one being a linear search of the entire address
> space. This is an inefficient use of cpu cycles.
>
> A component of the libhugetlb kernel test does exactly this, and as
> systems' address spaces increase beyond 32-bits, this method becomes
> exceedingly tedious.
>
> For example, on a ppc64le system with a 47-bit address space, the linear
> search causes the test to hang for some unknown amount of time. I
> couldn't give you an exact number because I just ran it for about 10-20
> minutes and went to go do something else, probably to get coffee or
> something, and when I came back, I just killed the test and patched it
> to use this new mechanism. I re-ran my new version of the test using a
> kernel with this patch, and of course it passed through the previously
> bottlenecking codepath nearly instantaneously.
>
> As such, I propose that the prctl syscall be extended to include the
> option to retrieve TASK_SIZE from the kernel.
>
> This patch will allow us to upgrade an O(n) codepath to O(1) in an
> architecture-independent manner, and provide a mechanism for future
> generations to do the same.

So the only reason for the new API is boosting some random poorly
written userspace test? Why don't you introduce binary search instead?

Look at /proc/<pid>/maps. It may help to reduce the memory area to be
checked.

> Changes from v2:
> We now account for the case of 32-bit compat userspace on a 64-bit kernel
> More detail about the nature of TASK_SIZE in documentation
>
> Joel Savitz(2):
> sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
> prctl.2: Document the new PR_GET_TASK_SIZE option
>
> include/uapi/linux/prctl.h | 3 +++
> kernel/sys.c | 23 +++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> man2/prctl.2 | 10 ++++++++++
> 1 file changed, 10 insertions(+)
> --
> 2.18.1

2019-05-03 21:18:58

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> copy the value of TASK_SIZE to the userspace address in arg2.
>
> It is important that we account for the case of the userspace task
> running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> careful to copy the correct number of bytes to userspace to avoid stack
> corruption.
>
> Suggested-by: Yuri Norov <[email protected]>

I actually didn't suggest that. If you _really_ need TASK_SIZE to
be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
is a compile-time information, and it may available for userspace at
compile time as well.

> Suggested-by: Alexey Dobriyan <[email protected]>
> Signed-off-by: Joel Savitz <[email protected]>
> ---
> include/uapi/linux/prctl.h | 3 +++
> kernel/sys.c | 23 +++++++++++++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 094bb03b9cc2..2c261c461952 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -229,4 +229,7 @@ struct prctl_mm_map {
> # define PR_PAC_APDBKEY (1UL << 3)
> # define PR_PAC_APGAKEY (1UL << 4)
>
> +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> +#define PR_GET_TASK_SIZE 55
> +
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 12df0e5434b8..709584400070 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
> return 1;
> }
>
> +static int prctl_get_tasksize(void __user *uaddr)
> +{
> + unsigned long current_task_size, current_word_size;
> +
> + current_task_size = TASK_SIZE;
> + current_word_size = sizeof(unsigned long);
> +
> +#ifdef CONFIG_64BIT
> + /* On 64-bit architecture, we must check whether the current thread
> + * is running in 32-bit compat mode. If it is, we can simply cut
> + * the size in half. This avoids corruption of the userspace stack.
> + */
> + if (test_thread_flag(TIF_ADDR32))

It breaks build for all architectures except x86 since TIF_ADDR32 is
defined for x86 only.

In comment to v2 I suggested you to stick to fixed-size data type to
avoid exactly this problem.

NACK

Yury

> + current_word_size >>= 1;
> +#endif
> +
> + return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> +}
> +
> int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
> {
> return -EINVAL;
> @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> return -EINVAL;
> error = PAC_RESET_KEYS(me, arg2);
> break;
> + case PR_GET_TASK_SIZE:
> + error = prctl_get_tasksize((void *)arg2);
> + break;
> default:
> error = -EINVAL;
> break;
> --
> 2.18.1

2019-05-03 22:17:19

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

On Fri, May 03, 2019 at 02:08:31PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> > When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> > copy the value of TASK_SIZE to the userspace address in arg2.
> >
> > It is important that we account for the case of the userspace task
> > running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> > careful to copy the correct number of bytes to userspace to avoid stack
> > corruption.
> >
> > Suggested-by: Yuri Norov <[email protected]>
>
> I actually didn't suggest that. If you _really_ need TASK_SIZE to
> be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
> is a compile-time information, and it may available for userspace at
> compile time as well.
>

TASK_SIZE is a runtime resolved macro, dependent on the thread currently
running on the CPU. It's not a compile time constant.

Anyways, it's proven that going prctl(2), although interesting, as
suggested by Alexey, wasn't worth the hassle as it poses more issues
than it can possibly solve.

A better way to get this value exposed to userspace is really through
/proc/<pid>/status, where one can utilize TASK_SIZE_OF(mm->owner), or
simply mm->task_size, which seems to be properly assigned for each arch


> > Suggested-by: Alexey Dobriyan <[email protected]>
> > Signed-off-by: Joel Savitz <[email protected]>
> > ---
> > include/uapi/linux/prctl.h | 3 +++
> > kernel/sys.c | 23 +++++++++++++++++++++++
> > 2 files changed, 26 insertions(+)
> >
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 094bb03b9cc2..2c261c461952 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,7 @@ struct prctl_mm_map {
> > # define PR_PAC_APDBKEY (1UL << 3)
> > # define PR_PAC_APGAKEY (1UL << 4)
> >
> > +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> > +#define PR_GET_TASK_SIZE 55
> > +
> > #endif /* _LINUX_PRCTL_H */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 12df0e5434b8..709584400070 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
> > return 1;
> > }
> >
> > +static int prctl_get_tasksize(void __user *uaddr)
> > +{
> > + unsigned long current_task_size, current_word_size;
> > +
> > + current_task_size = TASK_SIZE;
> > + current_word_size = sizeof(unsigned long);
> > +
> > +#ifdef CONFIG_64BIT
> > + /* On 64-bit architecture, we must check whether the current thread
> > + * is running in 32-bit compat mode. If it is, we can simply cut
> > + * the size in half. This avoids corruption of the userspace stack.
> > + */
> > + if (test_thread_flag(TIF_ADDR32))
>
> It breaks build for all architectures except x86 since TIF_ADDR32 is
> defined for x86 only.
>
> In comment to v2 I suggested you to stick to fixed-size data type to
> avoid exactly this problem.
>
> NACK
>
> Yury
>
> > + current_word_size >>= 1;
> > +#endif
> > +
> > + return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> > +}
> > +
> > int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
> > {
> > return -EINVAL;
> > @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > return -EINVAL;
> > error = PAC_RESET_KEYS(me, arg2);
> > break;
> > + case PR_GET_TASK_SIZE:
> > + error = prctl_get_tasksize((void *)arg2);
> > + break;
> > default:
> > error = -EINVAL;
> > break;
> > --
> > 2.18.1

2019-05-03 22:19:19

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace

On Fri, May 03, 2019 at 01:49:12PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> > In the mainline kernel, there is no quick mechanism to get the virtual
> > memory size of the current process from userspace.
> >
> > Despite the current state of affairs, this information is available to the
> > user through several means, one being a linear search of the entire address
> > space. This is an inefficient use of cpu cycles.
> >
> > A component of the libhugetlb kernel test does exactly this, and as
> > systems' address spaces increase beyond 32-bits, this method becomes
> > exceedingly tedious.
> >
> > For example, on a ppc64le system with a 47-bit address space, the linear
> > search causes the test to hang for some unknown amount of time. I
> > couldn't give you an exact number because I just ran it for about 10-20
> > minutes and went to go do something else, probably to get coffee or
> > something, and when I came back, I just killed the test and patched it
> > to use this new mechanism. I re-ran my new version of the test using a
> > kernel with this patch, and of course it passed through the previously
> > bottlenecking codepath nearly instantaneously.
> >
> > As such, I propose that the prctl syscall be extended to include the
> > option to retrieve TASK_SIZE from the kernel.
> >
> > This patch will allow us to upgrade an O(n) codepath to O(1) in an
> > architecture-independent manner, and provide a mechanism for future
> > generations to do the same.
>
> So the only reason for the new API is boosting some random poorly
> written userspace test? Why don't you introduce binary search instead?
>

there's no real cost in exposing the value that is known to the kernel,
anyways, as long as it's not a freaking hassle (like trying to go with
this prctl(2) stunt). We just need to get it properly exported alongside
other task's VM-related values at /proc/<pid>/status.

> Look at /proc/<pid>/maps. It may help to reduce the memory area to be
> checked.
>
> > Changes from v2:
> > We now account for the case of 32-bit compat userspace on a 64-bit kernel
> > More detail about the nature of TASK_SIZE in documentation
> >
> > Joel Savitz(2):
> > sys/prctl: add PR_GET_TASK_SIZE option to prctl(2)
> > prctl.2: Document the new PR_GET_TASK_SIZE option
> >
> > include/uapi/linux/prctl.h | 3 +++
> > kernel/sys.c | 23 +++++++++++++++++++++++
> > 2 files changed, 26 insertions(+)
> >
> > man2/prctl.2 | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> > --
> > 2.18.1

2019-05-03 22:26:47

by Rafael Aquini

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

On Fri, May 03, 2019 at 02:08:31PM -0700, Yury Norov wrote:
> On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> > When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> > copy the value of TASK_SIZE to the userspace address in arg2.
> >
> > It is important that we account for the case of the userspace task
> > running in 32-bit compat mode on a 64-bit kernel. As such, we must be
> > careful to copy the correct number of bytes to userspace to avoid stack
> > corruption.
> >
> > Suggested-by: Yuri Norov <[email protected]>
>
> I actually didn't suggest that. If you _really_ need TASK_SIZE to
> be exposed, I would suggest to expose it in kernel headers. TASK_SIZE
> is a compile-time information, and it may available for userspace at
> compile time as well.
>
> > Suggested-by: Alexey Dobriyan <[email protected]>
> > Signed-off-by: Joel Savitz <[email protected]>
> > ---
> > include/uapi/linux/prctl.h | 3 +++
> > kernel/sys.c | 23 +++++++++++++++++++++++
> > 2 files changed, 26 insertions(+)
> >
> > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> > index 094bb03b9cc2..2c261c461952 100644
> > --- a/include/uapi/linux/prctl.h
> > +++ b/include/uapi/linux/prctl.h
> > @@ -229,4 +229,7 @@ struct prctl_mm_map {
> > # define PR_PAC_APDBKEY (1UL << 3)
> > # define PR_PAC_APGAKEY (1UL << 4)
> >
> > +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> > +#define PR_GET_TASK_SIZE 55
> > +
> > #endif /* _LINUX_PRCTL_H */
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index 12df0e5434b8..709584400070 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -2252,6 +2252,26 @@ static int propagate_has_child_subreaper(struct task_struct *p, void *data)
> > return 1;
> > }
> >
> > +static int prctl_get_tasksize(void __user *uaddr)
> > +{
> > + unsigned long current_task_size, current_word_size;
> > +
> > + current_task_size = TASK_SIZE;
> > + current_word_size = sizeof(unsigned long);
> > +
> > +#ifdef CONFIG_64BIT
> > + /* On 64-bit architecture, we must check whether the current thread
> > + * is running in 32-bit compat mode. If it is, we can simply cut
> > + * the size in half. This avoids corruption of the userspace stack.
> > + */
> > + if (test_thread_flag(TIF_ADDR32))
>
> It breaks build for all architectures except x86 since TIF_ADDR32 is
> defined for x86 only.

Or we could get TIF_32BIT also defined for x86 (same value of
TIF_ADDR32) and check for it instead. i.e.

...
#if defined(CONFIG_64BIT) && defined(TIF_32BIT)
if (test_thread_flag(TIF_32BIT))
...

which is also uglier and keeps adding unecessary complexity to a very
simple task. At this point, I think we just should give up on trying
this via prctl(2) and do it via /proc/<pid>/status instead.


>
> In comment to v2 I suggested you to stick to fixed-size data type to
> avoid exactly this problem.
>
> NACK
>
> Yury
>
> > + current_word_size >>= 1;
> > +#endif
> > +
> > + return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> > +}
> > +
> > int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
> > {
> > return -EINVAL;
> > @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> > return -EINVAL;
> > error = PAC_RESET_KEYS(me, arg2);
> > break;
> > + case PR_GET_TASK_SIZE:
> > + error = prctl_get_tasksize((void *)arg2);
> > + break;
> > default:
> > error = -EINVAL;
> > break;
> > --
> > 2.18.1

2019-05-03 23:29:06

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

On Fri, May 3, 2019 at 2:12 PM Joel Savitz <[email protected]> wrote:
> When PR_GET_TASK_SIZE is passed to prctl, the kernel will attempt to
> copy the value of TASK_SIZE to the userspace address in arg2.

A commit message shouldn't just describe what you're doing, but also
why you're doing it. Is this intended for processes that are running
on X86-64 and want to determine whether the system supports 5-level
paging, or something like that?

> +static int prctl_get_tasksize(void __user *uaddr)
> +{
> + unsigned long current_task_size, current_word_size;
> +
> + current_task_size = TASK_SIZE;
> + current_word_size = sizeof(unsigned long);
> +
> +#ifdef CONFIG_64BIT
> + /* On 64-bit architecture, we must check whether the current thread
> + * is running in 32-bit compat mode. If it is, we can simply cut
> + * the size in half. This avoids corruption of the userspace stack.
> + */
> + if (test_thread_flag(TIF_ADDR32))
> + current_word_size >>= 1;
> +#endif
> +
> + return copy_to_user(uaddr, &current_task_size, current_word_size) ? -EFAULT : 0;
> +}

This function looks completely wrong; in particular, you're assuming
that the architecture is little-endian.
Make the value a u64, and you won't have these problems:

static int prctl_get_tasksize(u64 __user *uaddr)
{
return put_user(TASK_SIZE, uaddr) ? -EFAULT : 0;
}

A bunch of other new pieces of userspace API already use "u64" to
store userspace pointers and lengths to avoid compat issues.

> @@ -2486,6 +2506,9 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> return -EINVAL;
> error = PAC_RESET_KEYS(me, arg2);
> break;
> + case PR_GET_TASK_SIZE:
> + error = prctl_get_tasksize((void *)arg2);

s/void */void __user */

2019-05-04 04:25:12

by Yury Norov

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] sys/prctl: expose TASK_SIZE value to userspace

On Fri, May 03, 2019 at 05:57:31PM -0400, Rafael Aquini wrote:
> On Fri, May 03, 2019 at 01:49:12PM -0700, Yury Norov wrote:
> > On Fri, May 03, 2019 at 02:10:19PM -0400, Joel Savitz wrote:
> > > In the mainline kernel, there is no quick mechanism to get the virtual
> > > memory size of the current process from userspace.
> > >
> > > Despite the current state of affairs, this information is available to the
> > > user through several means, one being a linear search of the entire address
> > > space. This is an inefficient use of cpu cycles.
> > >
> > > A component of the libhugetlb kernel test does exactly this, and as
> > > systems' address spaces increase beyond 32-bits, this method becomes
> > > exceedingly tedious.
> > >
> > > For example, on a ppc64le system with a 47-bit address space, the linear
> > > search causes the test to hang for some unknown amount of time. I
> > > couldn't give you an exact number because I just ran it for about 10-20
> > > minutes and went to go do something else, probably to get coffee or
> > > something, and when I came back, I just killed the test and patched it
> > > to use this new mechanism. I re-ran my new version of the test using a
> > > kernel with this patch, and of course it passed through the previously
> > > bottlenecking codepath nearly instantaneously.
> > >
> > > As such, I propose that the prctl syscall be extended to include the
> > > option to retrieve TASK_SIZE from the kernel.
> > >
> > > This patch will allow us to upgrade an O(n) codepath to O(1) in an
> > > architecture-independent manner, and provide a mechanism for future
> > > generations to do the same.
> >
> > So the only reason for the new API is boosting some random poorly
> > written userspace test? Why don't you introduce binary search instead?
> >
>
> there's no real cost in exposing the value that is known to the kernel,

Really? We all here used to think that kernel programming is one of
the most difficult professions in the world. There is huge cost of
proper implementation of a feature, careful review, spread testing on
various platforms and long-term maintenance and support.

In this specific example of exposing TASK_SIZE your team made too much
things wrong to realize it, I hope.

> anyways, as long as it's not a freaking hassle (like trying to go with
> this prctl(2) stunt). We just need to get it properly exported alongside
> other task's VM-related values at /proc/<pid>/status.

I found this thread thrilling. Please keep me in CC with your
/proc/<pid>/status effort.

Yury

> > Look at /proc/<pid>/maps. It may help to reduce the memory area to be
> > checked.

2019-05-04 06:58:38

by Alexey Dobriyan

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] kernel/sys: add PR_GET_TASK_SIZE option to prctl(2)

On Fri, May 03, 2019 at 02:10:20PM -0400, Joel Savitz wrote:
> +/* Get the process virtual memory size (i.e. the highest usable VM address) */
> +#define PR_GET_TASK_SIZE 55

TASK_SIZE is in fact the lowest _un_usable address. :^)