While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
on x86_64, i386, arm and arm64 the following tests failed.
tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
getxattr05.c:87: TPASS: Got same data when acquiring the value of
system.posix_acl_access twice
getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
tst_test.c:391: TBROK: Invalid child (13545) exit value 1
fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
tst_test.c:391: TBROK: Invalid child (14739) exit value 1
sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
strace output:
--------------
[pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
[pid 481] clone(child_stack=NULL,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x7f3af0fa7a10) = 483
strace: Process 483 attached
[pid 481] wait4(-1, <unfinished ...>
[pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
metadata:
git branch: master
git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git commit: 57c149e506d5bec1b845ad1a8a631063fcac1f6e
git describe: next-20220110
arch: x86
toolchain: gcc-11
Reported-by: Linux Kernel Functional Testing <[email protected]>
GOOD: next-20220107
BAD: next-20220110
Test logs:
https://lkft.validation.linaro.org/scheduler/job/4301888#L1474
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20220110/testrun/7253656/suite/ltp-syscalls-tests/test/getxattr05/log
compare test history:
https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20220112/testrun/7277164/suite/ltp-syscalls-tests/test/getxattr05/history/
kernel-config:
https://builds.tuxbuild.com/23V6AwGvHW7H3kr6WxZZwueajVS/config
We are investigating this regression.
Steps to reproduce:
# cd /opt/ltp
# ./runltp -s getxattr05
--
Linaro LKFT
https://lkft.linaro.org
On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> on x86_64, i386, arm and arm64 the following tests failed.
>
> tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> getxattr05.c:87: TPASS: Got same data when acquiring the value of
> system.posix_acl_access twice
> getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> tst_test.c:391: TBROK: Invalid child (13545) exit value 1
>
> fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> tst_test.c:391: TBROK: Invalid child (14739) exit value 1
>
> sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
>
> setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
>
> strace output:
> --------------
> [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> [pid 481] clone(child_stack=NULL,
> flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x7f3af0fa7a10) = 483
> strace: Process 483 attached
> [pid 481] wait4(-1, <unfinished ...>
> [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
This looks like another regression in the ucount code. Reverting the
following commit fixes it and makes the getxattr05 test work again:
commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
Author: Alexey Gladkov <[email protected]>
Date: Fri Dec 17 15:48:23 2021 +0100
ucounts: Split rlimit and ucount values and max values
Since the semantics of maximum rlimit values are different, it would be
better not to mix ucount and rlimit values. This will prevent the error
of using inc_count/dec_ucount for rlimit parameters.
This patch also renames the functions to emphasize the lack of
connection between rlimit and ucount.
v2:
- Fix the array-index-out-of-bounds that was found by the lkp project.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <[email protected]>
The issue only surfaces if /proc/sys/user/max_user_namespaces is
actually written to.
On Wed, 12 Jan 2022 at 14:18, Christian Brauner
<[email protected]> wrote:
>
> On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> > on x86_64, i386, arm and arm64 the following tests failed.
> >
> > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> > getxattr05.c:87: TPASS: Got same data when acquiring the value of
> > system.posix_acl_access twice
> > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
> >
> > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
> >
> > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
> >
> > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
> >
> > strace output:
> > --------------
> > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> > [pid 481] clone(child_stack=NULL,
> > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > child_tidptr=0x7f3af0fa7a10) = 483
> > strace: Process 483 attached
> > [pid 481] wait4(-1, <unfinished ...>
> > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
>
> This looks like another regression in the ucount code. Reverting the
> following commit fixes it and makes the getxattr05 test work again:
>
> commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
> Author: Alexey Gladkov <[email protected]>
> Date: Fri Dec 17 15:48:23 2021 +0100
>
> ucounts: Split rlimit and ucount values and max values
>
> Since the semantics of maximum rlimit values are different, it would be
> better not to mix ucount and rlimit values. This will prevent the error
> of using inc_count/dec_ucount for rlimit parameters.
>
> This patch also renames the functions to emphasize the lack of
> connection between rlimit and ucount.
>
> v2:
> - Fix the array-index-out-of-bounds that was found by the lkp project.
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Alexey Gladkov <[email protected]>
> Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
> Signed-off-by: Eric W. Biederman <[email protected]>
>
> The issue only surfaces if /proc/sys/user/max_user_namespaces is
> actually written to.
I did a git bisect and that pointed me to this patch too.
Cheers,
Anders
On Wed, Jan 12, 2022 at 02:22:42PM +0100, Anders Roxell wrote:
> On Wed, 12 Jan 2022 at 14:18, Christian Brauner
> <[email protected]> wrote:
> >
> > On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> > > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> > > on x86_64, i386, arm and arm64 the following tests failed.
> > >
> > > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> > > getxattr05.c:87: TPASS: Got same data when acquiring the value of
> > > system.posix_acl_access twice
> > > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
> > >
> > > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> > > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
> > >
> > > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
> > >
> > > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
> > >
> > > strace output:
> > > --------------
> > > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> > > [pid 481] clone(child_stack=NULL,
> > > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > > child_tidptr=0x7f3af0fa7a10) = 483
> > > strace: Process 483 attached
> > > [pid 481] wait4(-1, <unfinished ...>
> > > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
> >
> > This looks like another regression in the ucount code. Reverting the
> > following commit fixes it and makes the getxattr05 test work again:
> >
> > commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
> > Author: Alexey Gladkov <[email protected]>
> > Date: Fri Dec 17 15:48:23 2021 +0100
> >
> > ucounts: Split rlimit and ucount values and max values
> >
> > Since the semantics of maximum rlimit values are different, it would be
> > better not to mix ucount and rlimit values. This will prevent the error
> > of using inc_count/dec_ucount for rlimit parameters.
> >
> > This patch also renames the functions to emphasize the lack of
> > connection between rlimit and ucount.
> >
> > v2:
> > - Fix the array-index-out-of-bounds that was found by the lkp project.
> >
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Alexey Gladkov <[email protected]>
> > Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
> > Signed-off-by: Eric W. Biederman <[email protected]>
> >
> > The issue only surfaces if /proc/sys/user/max_user_namespaces is
> > actually written to.
>
> I did a git bisect and that pointed me to this patch too.
Uhm, doesn't this want to be:
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 22070f004e97..108c6a879cd8 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -264,7 +264,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
long ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long new = atomic_long_add_return(v, &iter->ucount[type]);
+ long new = atomic_long_add_return(v, &iter->rlimit[type]);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
@@ -279,7 +279,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
struct ucounts *iter;
long new = -1; /* Silence compiler warning */
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long dec = atomic_long_sub_return(v, &iter->ucount[type]);
+ long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
if (iter == ucounts)
new = dec;
otherwise,
inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_NPROC, 1)
means
long inc_rlimit_ucounts(struct ucounts *ucounts, UCOUNT_RLIMIT_NPROC, long v)
{
struct ucounts *iter;
long max = LONG_MAX;
long ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long new = atomic_long_add_return(v, &iter->ucount[UCOUNT_RLIMIT_NPROC]);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
ret = new;
max = get_userns_rlimit_max(iter->ns, UCOUNT_RLIMIT_NPROC);
}
return ret;
}
which means that UCOUNT_RLIMIT_NPROC overwrites ucount[UCOUNT_RLIMIT_NPROC]?
On Wed, Jan 12, 2022 at 03:02:54PM +0100, Christian Brauner wrote:
> On Wed, Jan 12, 2022 at 02:22:42PM +0100, Anders Roxell wrote:
> > On Wed, 12 Jan 2022 at 14:18, Christian Brauner
> > <[email protected]> wrote:
> > >
> > > On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> > > > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> > > > on x86_64, i386, arm and arm64 the following tests failed.
> > > >
> > > > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> > > > getxattr05.c:87: TPASS: Got same data when acquiring the value of
> > > > system.posix_acl_access twice
> > > > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
> > > >
> > > > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> > > > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
> > > >
> > > > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > >
> > > > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > >
> > > > strace output:
> > > > --------------
> > > > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> > > > [pid 481] clone(child_stack=NULL,
> > > > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > > > child_tidptr=0x7f3af0fa7a10) = 483
> > > > strace: Process 483 attached
> > > > [pid 481] wait4(-1, <unfinished ...>
> > > > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
> > >
> > > This looks like another regression in the ucount code. Reverting the
> > > following commit fixes it and makes the getxattr05 test work again:
> > >
> > > commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
> > > Author: Alexey Gladkov <[email protected]>
> > > Date: Fri Dec 17 15:48:23 2021 +0100
> > >
> > > ucounts: Split rlimit and ucount values and max values
> > >
> > > Since the semantics of maximum rlimit values are different, it would be
> > > better not to mix ucount and rlimit values. This will prevent the error
> > > of using inc_count/dec_ucount for rlimit parameters.
> > >
> > > This patch also renames the functions to emphasize the lack of
> > > connection between rlimit and ucount.
> > >
> > > v2:
> > > - Fix the array-index-out-of-bounds that was found by the lkp project.
> > >
> > > Reported-by: kernel test robot <[email protected]>
> > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
> > > Signed-off-by: Eric W. Biederman <[email protected]>
> > >
> > > The issue only surfaces if /proc/sys/user/max_user_namespaces is
> > > actually written to.
> >
> > I did a git bisect and that pointed me to this patch too.
>
> Uhm, doesn't this want to be:
Yes. I miss it. I tried not to mix the logic, but I myself stepped on this
problem.
>
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 22070f004e97..108c6a879cd8 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -264,7 +264,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
> long ret = 0;
>
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> - long new = atomic_long_add_return(v, &iter->ucount[type]);
> + long new = atomic_long_add_return(v, &iter->rlimit[type]);
> if (new < 0 || new > max)
> ret = LONG_MAX;
> else if (iter == ucounts)
> @@ -279,7 +279,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
> struct ucounts *iter;
> long new = -1; /* Silence compiler warning */
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> - long dec = atomic_long_sub_return(v, &iter->ucount[type]);
> + long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
> WARN_ON_ONCE(dec < 0);
> if (iter == ucounts)
> new = dec;
>
>
> otherwise,
>
> inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_NPROC, 1)
>
> means
>
> long inc_rlimit_ucounts(struct ucounts *ucounts, UCOUNT_RLIMIT_NPROC, long v)
> {
> struct ucounts *iter;
> long max = LONG_MAX;
> long ret = 0;
>
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> long new = atomic_long_add_return(v, &iter->ucount[UCOUNT_RLIMIT_NPROC]);
> if (new < 0 || new > max)
> ret = LONG_MAX;
> else if (iter == ucounts)
> ret = new;
> max = get_userns_rlimit_max(iter->ns, UCOUNT_RLIMIT_NPROC);
> }
> return ret;
> }
>
> which means that UCOUNT_RLIMIT_NPROC overwrites ucount[UCOUNT_RLIMIT_NPROC]?
>
--
Rgrds, legion
On Wed, Jan 12, 2022 at 03:14:45PM +0100, Alexey Gladkov wrote:
> On Wed, Jan 12, 2022 at 03:02:54PM +0100, Christian Brauner wrote:
> > On Wed, Jan 12, 2022 at 02:22:42PM +0100, Anders Roxell wrote:
> > > On Wed, 12 Jan 2022 at 14:18, Christian Brauner
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> > > > > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> > > > > on x86_64, i386, arm and arm64 the following tests failed.
> > > > >
> > > > > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> > > > > getxattr05.c:87: TPASS: Got same data when acquiring the value of
> > > > > system.posix_acl_access twice
> > > > > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
> > > > >
> > > > > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> > > > > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
> > > > >
> > > > > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > > >
> > > > > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > > >
> > > > > strace output:
> > > > > --------------
> > > > > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> > > > > [pid 481] clone(child_stack=NULL,
> > > > > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > > > > child_tidptr=0x7f3af0fa7a10) = 483
> > > > > strace: Process 483 attached
> > > > > [pid 481] wait4(-1, <unfinished ...>
> > > > > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
> > > >
> > > > This looks like another regression in the ucount code. Reverting the
> > > > following commit fixes it and makes the getxattr05 test work again:
> > > >
> > > > commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
> > > > Author: Alexey Gladkov <[email protected]>
> > > > Date: Fri Dec 17 15:48:23 2021 +0100
> > > >
> > > > ucounts: Split rlimit and ucount values and max values
> > > >
> > > > Since the semantics of maximum rlimit values are different, it would be
> > > > better not to mix ucount and rlimit values. This will prevent the error
> > > > of using inc_count/dec_ucount for rlimit parameters.
> > > >
> > > > This patch also renames the functions to emphasize the lack of
> > > > connection between rlimit and ucount.
> > > >
> > > > v2:
> > > > - Fix the array-index-out-of-bounds that was found by the lkp project.
> > > >
> > > > Reported-by: kernel test robot <[email protected]>
> > > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > > Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
> > > > Signed-off-by: Eric W. Biederman <[email protected]>
> > > >
> > > > The issue only surfaces if /proc/sys/user/max_user_namespaces is
> > > > actually written to.
> > >
> > > I did a git bisect and that pointed me to this patch too.
> >
> > Uhm, doesn't this want to be:
>
> Yes. I miss it. I tried not to mix the logic, but I myself stepped on this
> problem.
It should be fixed in the four places:
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 22070f004e97..5c373a453f43 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -264,7 +264,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
long ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long new = atomic_long_add_return(v, &iter->ucount[type]);
+ long new = atomic_long_add_return(v, &iter->rlimit[type]);
if (new < 0 || new > max)
ret = LONG_MAX;
else if (iter == ucounts)
@@ -279,7 +279,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
struct ucounts *iter;
long new = -1; /* Silence compiler warning */
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long dec = atomic_long_sub_return(v, &iter->ucount[type]);
+ long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
if (iter == ucounts)
new = dec;
@@ -292,7 +292,7 @@ static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
{
struct ucounts *iter, *next;
for (iter = ucounts; iter != last; iter = next) {
- long dec = atomic_long_sub_return(1, &iter->ucount[type]);
+ long dec = atomic_long_sub_return(1, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
next = iter->ns->ucounts;
if (dec == 0)
@@ -313,7 +313,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
long dec, ret = 0;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- long new = atomic_long_add_return(1, &iter->ucount[type]);
+ long new = atomic_long_add_return(1, &iter->rlimit[type]);
if (new < 0 || new > max)
goto unwind;
if (iter == ucounts)
@@ -330,7 +330,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
}
return ret;
dec_unwind:
- dec = atomic_long_sub_return(1, &iter->ucount[type]);
+ dec = atomic_long_sub_return(1, &iter->rlimit[type]);
WARN_ON_ONCE(dec < 0);
unwind:
do_dec_rlimit_put_ucounts(ucounts, iter, type);
--
Rgrds, legion
On Wed, 12 Jan 2022 at 15:28, Alexey Gladkov <[email protected]> wrote:
>
> On Wed, Jan 12, 2022 at 03:14:45PM +0100, Alexey Gladkov wrote:
> > On Wed, Jan 12, 2022 at 03:02:54PM +0100, Christian Brauner wrote:
> > > On Wed, Jan 12, 2022 at 02:22:42PM +0100, Anders Roxell wrote:
> > > > On Wed, 12 Jan 2022 at 14:18, Christian Brauner
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
> > > > > > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
> > > > > > on x86_64, i386, arm and arm64 the following tests failed.
> > > > > >
> > > > > > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
> > > > > > getxattr05.c:87: TPASS: Got same data when acquiring the value of
> > > > > > system.posix_acl_access twice
> > > > > > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > > > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
> > > > > >
> > > > > > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
> > > > > > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
> > > > > > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
> > > > > >
> > > > > > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > > > >
> > > > > > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
> > > > > >
> > > > > > strace output:
> > > > > > --------------
> > > > > > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
> > > > > > [pid 481] clone(child_stack=NULL,
> > > > > > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> > > > > > child_tidptr=0x7f3af0fa7a10) = 483
> > > > > > strace: Process 483 attached
> > > > > > [pid 481] wait4(-1, <unfinished ...>
> > > > > > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
> > > > >
> > > > > This looks like another regression in the ucount code. Reverting the
> > > > > following commit fixes it and makes the getxattr05 test work again:
> > > > >
> > > > > commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
> > > > > Author: Alexey Gladkov <[email protected]>
> > > > > Date: Fri Dec 17 15:48:23 2021 +0100
> > > > >
> > > > > ucounts: Split rlimit and ucount values and max values
> > > > >
> > > > > Since the semantics of maximum rlimit values are different, it would be
> > > > > better not to mix ucount and rlimit values. This will prevent the error
> > > > > of using inc_count/dec_ucount for rlimit parameters.
> > > > >
> > > > > This patch also renames the functions to emphasize the lack of
> > > > > connection between rlimit and ucount.
> > > > >
> > > > > v2:
> > > > > - Fix the array-index-out-of-bounds that was found by the lkp project.
> > > > >
> > > > > Reported-by: kernel test robot <[email protected]>
> > > > > Signed-off-by: Alexey Gladkov <[email protected]>
> > > > > Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
> > > > > Signed-off-by: Eric W. Biederman <[email protected]>
> > > > >
> > > > > The issue only surfaces if /proc/sys/user/max_user_namespaces is
> > > > > actually written to.
> > > >
> > > > I did a git bisect and that pointed me to this patch too.
> > >
> > > Uhm, doesn't this want to be:
> >
> > Yes. I miss it. I tried not to mix the logic, but I myself stepped on this
> > problem.
>
> It should be fixed in the four places:
>
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 22070f004e97..5c373a453f43 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -264,7 +264,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
> long ret = 0;
>
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> - long new = atomic_long_add_return(v, &iter->ucount[type]);
> + long new = atomic_long_add_return(v, &iter->rlimit[type]);
> if (new < 0 || new > max)
> ret = LONG_MAX;
> else if (iter == ucounts)
> @@ -279,7 +279,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
> struct ucounts *iter;
> long new = -1; /* Silence compiler warning */
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> - long dec = atomic_long_sub_return(v, &iter->ucount[type]);
> + long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
> WARN_ON_ONCE(dec < 0);
> if (iter == ucounts)
> new = dec;
> @@ -292,7 +292,7 @@ static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> {
> struct ucounts *iter, *next;
> for (iter = ucounts; iter != last; iter = next) {
> - long dec = atomic_long_sub_return(1, &iter->ucount[type]);
> + long dec = atomic_long_sub_return(1, &iter->rlimit[type]);
> WARN_ON_ONCE(dec < 0);
> next = iter->ns->ucounts;
> if (dec == 0)
> @@ -313,7 +313,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
> long dec, ret = 0;
>
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> - long new = atomic_long_add_return(1, &iter->ucount[type]);
> + long new = atomic_long_add_return(1, &iter->rlimit[type]);
> if (new < 0 || new > max)
> goto unwind;
> if (iter == ucounts)
> @@ -330,7 +330,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
> }
> return ret;
> dec_unwind:
> - dec = atomic_long_sub_return(1, &iter->ucount[type]);
> + dec = atomic_long_sub_return(1, &iter->rlimit[type]);
> WARN_ON_ONCE(dec < 0);
> unwind:
> do_dec_rlimit_put_ucounts(ucounts, iter, type);
>
Thank you for the fix.
I applied this patch and built and ran it in qemu for arm64 and x86.
'./runltp -s getxattr05' passed on both architectures.
Tested-by: Linux Kernel Functional Testing <[email protected]>
Cheers,
Anders
Anders Roxell <[email protected]> writes:
> On Wed, 12 Jan 2022 at 15:28, Alexey Gladkov <[email protected]> wrote:
>>
>> On Wed, Jan 12, 2022 at 03:14:45PM +0100, Alexey Gladkov wrote:
>> > On Wed, Jan 12, 2022 at 03:02:54PM +0100, Christian Brauner wrote:
>> > > On Wed, Jan 12, 2022 at 02:22:42PM +0100, Anders Roxell wrote:
>> > > > On Wed, 12 Jan 2022 at 14:18, Christian Brauner
>> > > > <[email protected]> wrote:
>> > > > >
>> > > > > On Wed, Jan 12, 2022 at 05:15:37PM +0530, Naresh Kamboju wrote:
>> > > > > > While testing LTP syscalls with Linux next 20220110 (and till date 20220112)
>> > > > > > on x86_64, i386, arm and arm64 the following tests failed.
>> > > > > >
>> > > > > > tst_test.c:1365: TINFO: Timeout per run is 0h 15m 00s
>> > > > > > getxattr05.c:87: TPASS: Got same data when acquiring the value of
>> > > > > > system.posix_acl_access twice
>> > > > > > getxattr05.c:97: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
>> > > > > > tst_test.c:391: TBROK: Invalid child (13545) exit value 1
>> > > > > >
>> > > > > > fanotify17.c:176: TINFO: Test #1: Global groups limit in privileged user ns
>> > > > > > fanotify17.c:155: TFAIL: unshare(CLONE_NEWUSER) failed: ENOSPC (28)
>> > > > > > tst_test.c:391: TBROK: Invalid child (14739) exit value 1
>> > > > > >
>> > > > > > sendto03.c:48: TBROK: unshare(268435456) failed: ENOSPC (28)
>> > > > > >
>> > > > > > setsockopt05.c:45: TBROK: unshare(268435456) failed: ENOSPC (28)
>> > > > > >
>> > > > > > strace output:
>> > > > > > --------------
>> > > > > > [pid 481] wait4(-1, 0x7fff52f5ae8c, 0, NULL) = -1 ECHILD (No child processes)
>> > > > > > [pid 481] clone(child_stack=NULL,
>> > > > > > flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
>> > > > > > child_tidptr=0x7f3af0fa7a10) = 483
>> > > > > > strace: Process 483 attached
>> > > > > > [pid 481] wait4(-1, <unfinished ...>
>> > > > > > [pid 483] unshare(CLONE_NEWUSER) = -1 ENOSPC (No space left on device)
>> > > > >
>> > > > > This looks like another regression in the ucount code. Reverting the
>> > > > > following commit fixes it and makes the getxattr05 test work again:
>> > > > >
>> > > > > commit 0315b634f933b0f12cfa82660322f6186c1aa0f4
>> > > > > Author: Alexey Gladkov <[email protected]>
>> > > > > Date: Fri Dec 17 15:48:23 2021 +0100
>> > > > >
>> > > > > ucounts: Split rlimit and ucount values and max values
>> > > > >
>> > > > > Since the semantics of maximum rlimit values are different, it would be
>> > > > > better not to mix ucount and rlimit values. This will prevent the error
>> > > > > of using inc_count/dec_ucount for rlimit parameters.
>> > > > >
>> > > > > This patch also renames the functions to emphasize the lack of
>> > > > > connection between rlimit and ucount.
>> > > > >
>> > > > > v2:
>> > > > > - Fix the array-index-out-of-bounds that was found by the lkp project.
>> > > > >
>> > > > > Reported-by: kernel test robot <[email protected]>
>> > > > > Signed-off-by: Alexey Gladkov <[email protected]>
>> > > > > Link: https://lkml.kernel.org/r/73ea569042babda5cee2092423da85027ceb471f.1639752364.git.legion@kernel.org
>> > > > > Signed-off-by: Eric W. Biederman <[email protected]>
>> > > > >
>> > > > > The issue only surfaces if /proc/sys/user/max_user_namespaces is
>> > > > > actually written to.
>> > > >
>> > > > I did a git bisect and that pointed me to this patch too.
>> > >
>> > > Uhm, doesn't this want to be:
>> >
>> > Yes. I miss it. I tried not to mix the logic, but I myself stepped on this
>> > problem.
>>
>> It should be fixed in the four places:
>>
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 22070f004e97..5c373a453f43 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -264,7 +264,7 @@ long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
>> long ret = 0;
>>
>> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> - long new = atomic_long_add_return(v, &iter->ucount[type]);
>> + long new = atomic_long_add_return(v, &iter->rlimit[type]);
>> if (new < 0 || new > max)
>> ret = LONG_MAX;
>> else if (iter == ucounts)
>> @@ -279,7 +279,7 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v)
>> struct ucounts *iter;
>> long new = -1; /* Silence compiler warning */
>> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> - long dec = atomic_long_sub_return(v, &iter->ucount[type]);
>> + long dec = atomic_long_sub_return(v, &iter->rlimit[type]);
>> WARN_ON_ONCE(dec < 0);
>> if (iter == ucounts)
>> new = dec;
>> @@ -292,7 +292,7 @@ static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> {
>> struct ucounts *iter, *next;
>> for (iter = ucounts; iter != last; iter = next) {
>> - long dec = atomic_long_sub_return(1, &iter->ucount[type]);
>> + long dec = atomic_long_sub_return(1, &iter->rlimit[type]);
>> WARN_ON_ONCE(dec < 0);
>> next = iter->ns->ucounts;
>> if (dec == 0)
>> @@ -313,7 +313,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
>> long dec, ret = 0;
>>
>> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> - long new = atomic_long_add_return(1, &iter->ucount[type]);
>> + long new = atomic_long_add_return(1, &iter->rlimit[type]);
>> if (new < 0 || new > max)
>> goto unwind;
>> if (iter == ucounts)
>> @@ -330,7 +330,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type)
>> }
>> return ret;
>> dec_unwind:
>> - dec = atomic_long_sub_return(1, &iter->ucount[type]);
>> + dec = atomic_long_sub_return(1, &iter->rlimit[type]);
>> WARN_ON_ONCE(dec < 0);
>> unwind:
>> do_dec_rlimit_put_ucounts(ucounts, iter, type);
>>
>
> Thank you for the fix.
> I applied this patch and built and ran it in qemu for arm64 and x86.
> './runltp -s getxattr05' passed on both architectures.
>
> Tested-by: Linux Kernel Functional Testing <[email protected]>
Thank you all.
For now I have dropped this from linux-next. I will add the fix and
will aim to get this cleanup in the next merge window.
Eric