When a normal user is trying to allocate huge pages using shmget(), the
user is not able to get the memory even if the gid is present in
/proc/sys/vm/hugetlb_shm_group. The function user_shm_lock() is not
successful. The user does not have the capability to perform a
CAP_IPC_LOCK. A check is added here to see whether the gid is present in
hugetlb_shm_group. Please review the patch.
Signed-off-by: Ciju Rajan ([email protected])
---
--- linux-2.6.23/mm/mlock.c.orig 2007-11-14 17:35:02.000000000 +0530
+++ linux-2.6.23/mm/mlock.c 2007-11-14 17:50:31.000000000 +0530
@@ -8,6 +8,7 @@
#include <linux/capability.h>
#include <linux/mman.h>
#include <linux/mm.h>
+#include <linux/hugetlb.h>
#include <linux/mempolicy.h>
#include <linux/syscalls.h>
#include <linux/sched.h>
@@ -248,8 +249,14 @@ int user_shm_lock(size_t size, struct us
allowed = 1;
lock_limit >>= PAGE_SHIFT;
spin_lock(&shmlock_user_lock);
+#ifdef CONFIG_HUGETLB_PAGE
+ if (!allowed &&
+ locked + user->locked_shm > lock_limit &&
+ (!(capable(CAP_IPC_LOCK) || in_group_p(sysctl_hugetlb_shm_group))))
+#else
if (!allowed &&
locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
+#endif
goto out;
get_uid(user);
user->locked_shm += locked;
Hi Ciju:
I am still not exactly sure why this patch is needed. As I read
user_shm_lock():
> lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> if (lock_limit == RLIM_INFINITY)
> allowed = 1;
> lock_limit >>= PAGE_SHIFT;
> spin_lock(&shmlock_user_lock);
> if (!allowed &&
> locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
> goto out;
... if the user's locked limit (ulimit -l) is set to unlimited, allowed
(above) is set to 1. In that case, the second part of that if() is
bypassed, and the function grants permission. Therefore, the easy
solution is to make sure your user's lock_limit is RLIM_INFINITY.
On Wed, 2007-11-14 at 19:45 +0530, Ciju Rajan K wrote:
<snip>
> @@ -248,8 +249,14 @@ int user_shm_lock(size_t size, struct us
> allowed = 1;
> lock_limit >>= PAGE_SHIFT;
> spin_lock(&shmlock_user_lock);
> +#ifdef CONFIG_HUGETLB_PAGE
> + if (!allowed &&
> + locked + user->locked_shm > lock_limit &&
> + (!(capable(CAP_IPC_LOCK) || in_group_p(sysctl_hugetlb_shm_group))))
This will allow any user in hugetlb_shm_group to make unlimited use of
huge page shm segments _and_ normal page shm segments. Definitely not
what you want.
> +#else
> if (!allowed &&
> locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
> +#endif
> goto out;
> get_uid(user);
> user->locked_shm += locked;
>
Please don't add new #ifdefs into .c files, headers only.
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
On Wed, Nov 14, 2007 at 09:31:41AM -0600, aglitke wrote:
> ... if the user's locked limit (ulimit -l) is set to unlimited, allowed
> (above) is set to 1. In that case, the second part of that if() is
> bypassed, and the function grants permission. Therefore, the easy
> solution is to make sure your user's lock_limit is RLIM_INFINITY.
This function deserves a minor cleanup and a bit more commenting.
Reading user->locked_shm within shmlock_user_lock would be nice, too.
Maybe something like this (untested, uncompiled) would do.
-- wli
diff --git a/mm/mlock.c b/mm/mlock.c
index 7b26560..5f51792 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -234,6 +234,12 @@ asmlinkage long sys_munlockall(void)
/*
* Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB
* shm segments) get accounted against the user_struct instead.
+ * First, user_shm_lock() checks that the user has permission to lock
+ * enough memory; then if so, the locked shm is accounted to the user's
+ * system-wide state. shmlock_user_lock protects the per-user field
+ * tracking how much locked_shm is in use within the struct user_struct.
+ * shmlock_user_lock is taken early to guard the read-only check that
+ * user->locked_shm is in-bounds against updates to user->locked_shm.
*/
static DEFINE_SPINLOCK(shmlock_user_lock);
@@ -242,19 +248,22 @@ int user_shm_lock(size_t size, struct user_struct *user)
unsigned long lock_limit, locked;
int allowed = 0;
+ spin_lock(&shmlock_user_lock);
locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
if (lock_limit == RLIM_INFINITY)
allowed = 1;
- lock_limit >>= PAGE_SHIFT;
- spin_lock(&shmlock_user_lock);
- if (!allowed &&
- locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
- goto out;
- get_uid(user);
- user->locked_shm += locked;
- allowed = 1;
-out:
+ else {
+ lock_limit >>= PAGE_SHIFT;
+ if (locked + user->locked_shm <= lock_limit)
+ allowed = 1;
+ else if (capable(CAP_IPC_LOCK))
+ allowed = 1;
+ }
+ if (allowed) {
+ get_uid(user);
+ user->locked_shm += locked;
+ }
spin_unlock(&shmlock_user_lock);
return allowed;
}
Hi Adam,
If this condition check is not included, the root user have to use the
function
setrlimit() to set the lock_limit of a normal user to RLIM_INFINITY. I
think
the /proc interface 'hugetlb_shm_group' is introduced to avoid these
difficulties.
Please correct me, if I am wrong.
Regarding the problem with the 'if' condition, I feel that even in the
case of
user's lock_limit is set to unlimited, he could use unlimited hugepages and
normal page shm segments. So what is the advantage in this scenario.
I tried to avoid the #ifdef statements. But the variable
sysctl_hugetlb_shm_group is defined
in fs/hugetlbfs/inode.c, this segment is enabled only when the config
parameter
CONFIG_HUGETLBFS is set to yes. If the hugetlbfs is not selected while
configuring,
there would be a compilation error.
Is there any better way so that the root user can configure the gid in
'hugetlb_shm_group'
and the user is able to access the huge pages using shmget().
Thanks
Ciju
aglitke wrote:
> Hi Ciju:
>
> I am still not exactly sure why this patch is needed. As I read
> user_shm_lock():
>
>
>> lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
>> if (lock_limit == RLIM_INFINITY)
>> allowed = 1;
>> lock_limit >>= PAGE_SHIFT;
>> spin_lock(&shmlock_user_lock);
>> if (!allowed &&
>> locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
>> goto out;
>>
>
> ... if the user's locked limit (ulimit -l) is set to unlimited, allowed
> (above) is set to 1. In that case, the second part of that if() is
> bypassed, and the function grants permission. Therefore, the easy
> solution is to make sure your user's lock_limit is RLIM_INFINITY.
>
> On Wed, 2007-11-14 at 19:45 +0530, Ciju Rajan K wrote:
> <snip>
>
>> @@ -248,8 +249,14 @@ int user_shm_lock(size_t size, struct us
>> allowed = 1;
>> lock_limit >>= PAGE_SHIFT;
>> spin_lock(&shmlock_user_lock);
>> +#ifdef CONFIG_HUGETLB_PAGE
>> + if (!allowed &&
>> + locked + user->locked_shm > lock_limit &&
>> + (!(capable(CAP_IPC_LOCK) || in_group_p(sysctl_hugetlb_shm_group))))
>>
>
> This will allow any user in hugetlb_shm_group to make unlimited use of
> huge page shm segments _and_ normal page shm segments. Definitely not
> what you want.
>
>
>> +#else
>> if (!allowed &&
>> locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
>> +#endif
>> goto out;
>> get_uid(user);
>> user->locked_shm += locked;
>>
>>
>
> Please don't add new #ifdefs into .c files, headers only.
>
>
Hi Wli,
I tested your patch. But that is not solving the problem.
If the code change to user_shm_lock() is not a good solution, could
you please suggest a method so that the normal user is able to allocate
the huge pages, if his gid is added to /proc/sys/vm/hugetlb_shm_group
Thanks
Ciju
William Lee Irwin III wrote:
> On Wed, Nov 14, 2007 at 09:31:41AM -0600, aglitke wrote:
>> ... if the user's locked limit (ulimit -l) is set to unlimited, allowed
>> (above) is set to 1. In that case, the second part of that if() is
>> bypassed, and the function grants permission. Therefore, the easy
>> solution is to make sure your user's lock_limit is RLIM_INFINITY.
>
> This function deserves a minor cleanup and a bit more commenting.
>
> Reading user->locked_shm within shmlock_user_lock would be nice, too.
>
> Maybe something like this (untested, uncompiled) would do.
>
>
> -- wli
>
>
> diff --git a/mm/mlock.c b/mm/mlock.c
> index 7b26560..5f51792 100644
> --- a/mm/mlock.c
> +++ b/mm/mlock.c
> @@ -234,6 +234,12 @@ asmlinkage long sys_munlockall(void)
> /*
> * Objects with different lifetime than processes (SHM_LOCK and SHM_HUGETLB
> * shm segments) get accounted against the user_struct instead.
> + * First, user_shm_lock() checks that the user has permission to lock
> + * enough memory; then if so, the locked shm is accounted to the user's
> + * system-wide state. shmlock_user_lock protects the per-user field
> + * tracking how much locked_shm is in use within the struct user_struct.
> + * shmlock_user_lock is taken early to guard the read-only check that
> + * user->locked_shm is in-bounds against updates to user->locked_shm.
> */
> static DEFINE_SPINLOCK(shmlock_user_lock);
>
> @@ -242,19 +248,22 @@ int user_shm_lock(size_t size, struct user_struct *user)
> unsigned long lock_limit, locked;
> int allowed = 0;
>
> + spin_lock(&shmlock_user_lock);
> locked = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> lock_limit = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur;
> if (lock_limit == RLIM_INFINITY)
> allowed = 1;
> - lock_limit >>= PAGE_SHIFT;
> - spin_lock(&shmlock_user_lock);
> - if (!allowed &&
> - locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK))
> - goto out;
> - get_uid(user);
> - user->locked_shm += locked;
> - allowed = 1;
> -out:
> + else {
> + lock_limit >>= PAGE_SHIFT;
> + if (locked + user->locked_shm <= lock_limit)
> + allowed = 1;
> + else if (capable(CAP_IPC_LOCK))
> + allowed = 1;
> + }
> + if (allowed) {
> + get_uid(user);
> + user->locked_shm += locked;
> + }
> spin_unlock(&shmlock_user_lock);
> return allowed;
> }
On Fri, Nov 30, 2007 at 12:02:32AM +0530, Ciju Rajan K wrote:
> I tested your patch. But that is not solving the problem.
> If the code change to user_shm_lock() is not a good solution, could
> you please suggest a method so that the normal user is able to allocate
> the huge pages, if his gid is added to /proc/sys/vm/hugetlb_shm_group
The patch I posted resolves a race unrelated to your issue. Raising your
locked memory limits should not be difficult. /etc/limits.conf or similar
should set it up for you. You can also change the default rlimit in the
kernel and compile it with default limits elevated to what you want your
unprivileged process to have to start with if you're truly having lots
of trouble getting userspace to set the default limits properly. I'd
look in include/asm-generic/resource.h
-- wli
William Lee Irwin III wrote:
> On Fri, Nov 30, 2007 at 12:02:32AM +0530, Ciju Rajan K wrote:
>> I tested your patch. But that is not solving the problem.
>> If the code change to user_shm_lock() is not a good solution, could
>> you please suggest a method so that the normal user is able to allocate
>> the huge pages, if his gid is added to /proc/sys/vm/hugetlb_shm_group
>
> The patch I posted resolves a race unrelated to your issue. Raising your
> locked memory limits should not be difficult. /etc/limits.conf or similar
> should set it up for you. You can also change the default rlimit in the
> kernel and compile it with default limits elevated to what you want your
> unprivileged process to have to start with if you're truly having lots
> of trouble getting userspace to set the default limits properly. I'd
> look in include/asm-generic/resource.h
>
>
> -- wli
Hi Wli,
The documentation available in the kernel for huge pages does not talk
about the issue associated with locked memory limit. I think it would be
helpful to the users if we mention about this in the documentation. I
am attaching a small documentation patch.
Thanks
Ciju
Signed-off-by: Ciju Rajan ([email protected])
---
--- Documentation/vm/hugetlbpage.txt.orig 2008-01-24 16:42:40.000000000 +0530
+++ Documentation/vm/hugetlbpage.txt 2008-01-29 19:36:45.000000000 +0530
@@ -108,6 +108,18 @@ a supplementary group and system admin n
applications to use any combination of mmaps and shm* calls, though the
mount of filesystem will be required for using mmap calls.
+Note: The default locked limit in the kernel is just 32KB. If the normal
+user whose gid is present in the file /proc/sys/vm/hugetlb_shm_group needs
+more memory than this, the default limit must be increased. If pam is installed
+on your system, resource limits can be configured by installing lines similar
+to the following in /etc/security/limits.conf:
+
+@hugegroup soft memlock 2097152
+@hugegroup hard memlock 2097152
+
+Otherwise, you may manipulate the locked limit command directly with 'ulimit'.
+See its man page for more information.
+
*******************************************************************
/*
Ciju Rajan K wrote:
> William Lee Irwin III wrote:
>> On Fri, Nov 30, 2007 at 12:02:32AM +0530, Ciju Rajan K wrote:
>>> I tested your patch. But that is not solving the problem.
>>> If the code change to user_shm_lock() is not a good solution, could
>>> you please suggest a method so that the normal user is able to
>>> allocate the huge pages, if his gid is added to
>>> /proc/sys/vm/hugetlb_shm_group
>>
>> The patch I posted resolves a race unrelated to your issue. Raising your
>> locked memory limits should not be difficult. /etc/limits.conf or similar
>> should set it up for you. You can also change the default rlimit in the
>> kernel and compile it with default limits elevated to what you want your
>> unprivileged process to have to start with if you're truly having lots
>> of trouble getting userspace to set the default limits properly. I'd
>> look in include/asm-generic/resource.h
>>
>>
>> -- wli
>
> Hi Wli,
>
> The documentation available in the kernel for huge pages does not talk
> about the issue associated with locked memory limit. I think it would be
> helpful to the users if we mention about this in the documentation. I
> am attaching a small documentation patch.
>
> Thanks
> Ciju
>
> Signed-off-by: Ciju Rajan ([email protected])
> ---
> --- Documentation/vm/hugetlbpage.txt.orig 2008-01-24
> 16:42:40.000000000 +0530
> +++ Documentation/vm/hugetlbpage.txt 2008-01-29 19:36:45.000000000 +0530
> @@ -108,6 +108,18 @@ a supplementary group and system admin n
> applications to use any combination of mmaps and shm* calls, though the
> mount of filesystem will be required for using mmap calls.
>
> +Note: The default locked limit in the kernel is just 32KB. If the normal
> +user whose gid is present in the file /proc/sys/vm/hugetlb_shm_group needs
> +more memory than this, the default limit must be increased. If pam is
> installed
> +on your system, resource limits can be configured by installing lines
> similar
> +to the following in /etc/security/limits.conf:
> +
> +@hugegroup soft memlock 2097152
> +@hugegroup hard memlock 2097152
> +
> +Otherwise, you may manipulate the locked limit command directly with
> 'ulimit'.
> +See its man page for more information.
> +
> *******************************************************************
>
> /*
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Wli,
Can this patch be included in the hugepage documentation?
Thanks
Ciju