LinuxLists.cc - [PATCH v3 0/2] sysctl: handle overflow for file-max

2018-10-16 22:35:51

Subject: [PATCH v3 0/2] sysctl: handle overflow for file-max

Hey,

Here is v3 of this patchset. Changelogs are in the individual commits.

Currently, when writing

echo 18446744073709551616 > /proc/sys/fs/file-max

/proc/sys/fs/file-max will overflow and be set to 0. That quickly
crashes the system.

The first version of this patch intended to detect the overflow and cap
at ULONG_MAX. However, we should not do this and rather return EINVAL on
overflow. The reasons are:
- this aligns with other sysctl handlers that simply reject overflows
(cf. [1], [2], and a bunch of others)
- we already do a partial fail on overflow right now
Namely, when the TMPBUFLEN is exceeded. So we already reject values
such as 184467440737095516160 (21 chars) but accept values such as
18446744073709551616 (20 chars) but both are overflows. So we should
just always reject 64bit overflows and not special-case this based on
the number of chars.

(This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)

Thanks!
Christian

[1]: fb910c42cceb ("sysctl: check for UINT_MAX before unsigned int min/max")
[2]: 196851bed522 ("s390/topology: correct topology mode proc handler")

Christian Brauner (2):
sysctl: handle overflow in proc_get_long
sysctl: handle overflow for file-max

kernel/sysctl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)

--
2.17.1

2018-10-16 22:34:41

by Christian Brauner

[permalink] [raw]

Subject: [PATCH v3 1/2] sysctl: handle overflow in proc_get_long

proc_get_long() is a funny function. It uses simple_strtoul() and for a
good reason. proc_get_long() wants to always succeed the parse and return
the maybe incorrect value and the trailing characters to check against a
pre-defined list of acceptable trailing values.
However, simple_strtoul() explicitly ignores overflows which can cause
funny things like the following to happen:

echo 18446744073709551616 > /proc/sys/fs/file-max
cat /proc/sys/fs/file-max
0

(Which will cause your system to silently die behind your back.)

On the other hand kstrtoul() does do overflow detection but does not return
the trailing characters, and also fails the parse when anything other than
'\n' is a trailing character whereas proc_get_long() wants to be more
lenient.

Now, before adding another kstrtoul() function let's simply add a static
parse strtoul_lenient() which:
- fails on overflow with -ERANGE
- returns the trailing characters to the caller

The reason why we should fail on ERANGE is that we already do a partial
fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we
already reject values such as 184467440737095516160 (21 chars) but accept
values such as 18446744073709551616 (20 chars) but both are overflows. So
we should just always reject 64bit overflows and not special-case this
based on the number of chars.

Acked-by: Kees Cook <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
v2->v3:
- (Kees) s/#include <../lib/kstrtox.h>/#include "../lib/kstrtox.h"/g
- (Kees) document strtoul_lenient()

v1->v2:
- s/sysctl_cap_erange/sysctl_lenient/g
- consistenly fail on overflow

v0->v1:
- s/sysctl_strtoul_lenient/strtoul_cap_erange/g
- (Al) remove bool overflow return argument from strtoul_cap_erange
- (Al) return ULONG_MAX on ERANGE from strtoul_cap_erange
- (Dominik) fix spelling in commit message
---
kernel/sysctl.c | 40 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index cc02050fd0c4..102aa7a65687 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -68,6 +68,8 @@
#include <linux/mount.h>
#include <linux/pipe_fs_i.h>

+#include "../lib/kstrtox.h"
+
#include <linux/uaccess.h>
#include <asm/processor.h>

@@ -2065,6 +2067,41 @@ static void proc_skip_char(char **buf, size_t *size, const char v)
}
}

+/**
+ * strtoul_lenient - parse an ASCII formatted integer from a buffer and only
+ * fail on overflow
+ *
+ * @cp: kernel buffer containing the string to parse
+ * @endp: pointer to store the trailing characters
+ * @base: the base to use
+ * @res: where the parsed integer will be stored
+ *
+ * In case of success 0 is returned and @res will contain the parsed integer,
+ * @endp will hold any trailing characters.
+ * This function will fail the parse on overflow. If there wasn't an overflow
+ * the function will defer the decision what characters count as invalid to the
+ * caller.
+ */
+static int strtoul_lenient(const char *cp, char **endp, unsigned int base,
+ unsigned long *res)
+{
+ unsigned long long result;
+ unsigned int rv;
+
+ cp = _parse_integer_fixup_radix(cp, &base);
+ rv = _parse_integer(cp, base, &result);
+ if ((rv & KSTRTOX_OVERFLOW) || (result != (unsigned long)result))
+ return -ERANGE;
+
+ cp += rv;
+
+ if (endp)
+ *endp = (char *)cp;
+
+ *res = (unsigned long)result;
+ return 0;
+}
+
#define TMPBUFLEN 22
/**
* proc_get_long - reads an ASCII formatted integer from a user buffer
@@ -2108,7 +2145,8 @@ static int proc_get_long(char **buf, size_t *size,
if (!isdigit(*p))
return -EINVAL;

- *val = simple_strtoul(p, &p, 0);
+ if (strtoul_lenient(p, &p, 0, val))
+ return -EINVAL;

len = p - tmp;

--
2.17.1

2018-10-16 22:34:50

by Christian Brauner

[permalink] [raw]

Subject: [PATCH v3 2/2] sysctl: handle overflow for file-max

Currently, when writing

echo 18446744073709551616 > /proc/sys/fs/file-max

/proc/sys/fs/file-max will overflow and be set to 0. That quickly
crashes the system.
This commit sets the max and min value for file-max and returns -EINVAL
when a long int is exceeded. Any higher value cannot currently be used as
the percpu counters are long ints and not unsigned integers. This behavior
also aligns with other tuneables that return -EINVAL when their range is
exceeded. See e.g. [1], [2] and others.

[1]: fb910c42cceb ("sysctl: check for UINT_MAX before unsigned int min/max")
[2]: 196851bed522 ("s390/topology: correct topology mode proc handler")

Acked-by: Kees Cook <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
---
v2->v3:
- unchanged

v2->v1:
- consistenly fail on overflow

v0->v1:
- if max value is < than ULONG_MAX use max as upper bound
- (Dominik) remove double "the" from commit message
---
kernel/sysctl.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 102aa7a65687..93456e3a90cd 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -128,6 +128,7 @@ static int __maybe_unused one = 1;
static int __maybe_unused two = 2;
static int __maybe_unused four = 4;
static unsigned long one_ul = 1;
+static unsigned long long_max = LONG_MAX;
static int one_hundred = 100;
static int one_thousand = 1000;
#ifdef CONFIG_PRINTK
@@ -1697,6 +1698,8 @@ static struct ctl_table fs_table[] = {
.maxlen = sizeof(files_stat.max_files),
.mode = 0644,
.proc_handler = proc_doulongvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &long_max,
},
{
.procname = "nr_open",
@@ -2813,6 +2816,10 @@ static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int
break;
if (neg)
continue;
+ if ((max && val > *max) || (min && val < *min)) {
+ err = -EINVAL;
+ break;
+ }
val = convmul * val / convdiv;
if ((min && val < *min) || (max && val > *max))
continue;
--
2.17.1

2018-10-16 22:37:20

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] sysctl: handle overflow for file-max

On Tue, Oct 16, 2018 at 3:33 PM, Christian Brauner <[email protected]> wrote:
> Hey,
>
> Here is v3 of this patchset. Changelogs are in the individual commits.

Thanks! These look good. Andrew, can you take these?

-Kees

>
> Currently, when writing
>
> echo 18446744073709551616 > /proc/sys/fs/file-max
>
> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> crashes the system.
>
> The first version of this patch intended to detect the overflow and cap
> at ULONG_MAX. However, we should not do this and rather return EINVAL on
> overflow. The reasons are:
> - this aligns with other sysctl handlers that simply reject overflows
> (cf. [1], [2], and a bunch of others)
> - we already do a partial fail on overflow right now
> Namely, when the TMPBUFLEN is exceeded. So we already reject values
> such as 184467440737095516160 (21 chars) but accept values such as
> 18446744073709551616 (20 chars) but both are overflows. So we should
> just always reject 64bit overflows and not special-case this based on
> the number of chars.
>
> (This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)
>
> Thanks!
> Christian
>
> [1]: fb910c42cceb ("sysctl: check for UINT_MAX before unsigned int min/max")
> [2]: 196851bed522 ("s390/topology: correct topology mode proc handler")
>
> Christian Brauner (2):
> sysctl: handle overflow in proc_get_long
> sysctl: handle overflow for file-max
>
> kernel/sysctl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 46 insertions(+), 1 deletion(-)
>
> --
> 2.17.1
>

--
Kees Cook
Pixel Security

2018-10-16 23:47:00

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] sysctl: handle overflow in proc_get_long

Christian Brauner <[email protected]> writes:

> proc_get_long() is a funny function. It uses simple_strtoul() and for a
> good reason. proc_get_long() wants to always succeed the parse and return
> the maybe incorrect value and the trailing characters to check against a
> pre-defined list of acceptable trailing values.
> However, simple_strtoul() explicitly ignores overflows which can cause
> funny things like the following to happen:
>
> echo 18446744073709551616 > /proc/sys/fs/file-max
> cat /proc/sys/fs/file-max
> 0
>
> (Which will cause your system to silently die behind your back.)
>
> On the other hand kstrtoul() does do overflow detection but does not return
> the trailing characters, and also fails the parse when anything other than
> '\n' is a trailing character whereas proc_get_long() wants to be more
> lenient.
>
> Now, before adding another kstrtoul() function let's simply add a static
> parse strtoul_lenient() which:
> - fails on overflow with -ERANGE
> - returns the trailing characters to the caller
>
> The reason why we should fail on ERANGE is that we already do a partial
> fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we
> already reject values such as 184467440737095516160 (21 chars) but accept
> values such as 18446744073709551616 (20 chars) but both are overflows. So
> we should just always reject 64bit overflows and not special-case this
> based on the number of chars.
>
> Acked-by: Kees Cook <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>
> Signed-off-by: Christian Brauner <[email protected]>
> ---
> v2->v3:
> - (Kees) s/#include <../lib/kstrtox.h>/#include "../lib/kstrtox.h"/g
> - (Kees) document strtoul_lenient()
>
> v1->v2:
> - s/sysctl_cap_erange/sysctl_lenient/g
> - consistenly fail on overflow
>
> v0->v1:
> - s/sysctl_strtoul_lenient/strtoul_cap_erange/g
> - (Al) remove bool overflow return argument from strtoul_cap_erange
> - (Al) return ULONG_MAX on ERANGE from strtoul_cap_erange
> - (Dominik) fix spelling in commit message
> ---
> kernel/sysctl.c | 40 +++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 39 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index cc02050fd0c4..102aa7a65687 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -68,6 +68,8 @@
> #include <linux/mount.h>
> #include <linux/pipe_fs_i.h>
>
> +#include "../lib/kstrtox.h"
> +
> #include <linux/uaccess.h>
> #include <asm/processor.h>
>
> @@ -2065,6 +2067,41 @@ static void proc_skip_char(char **buf, size_t *size, const char v)
> }
> }
>
> +/**
> + * strtoul_lenient - parse an ASCII formatted integer from a buffer and only
> + * fail on overflow
> + *
> + * @cp: kernel buffer containing the string to parse
> + * @endp: pointer to store the trailing characters
> + * @base: the base to use
> + * @res: where the parsed integer will be stored
> + *
> + * In case of success 0 is returned and @res will contain the parsed integer,
> + * @endp will hold any trailing characters.
> + * This function will fail the parse on overflow. If there wasn't an overflow
> + * the function will defer the decision what characters count as invalid to the
> + * caller.
> + */
> +static int strtoul_lenient(const char *cp, char **endp, unsigned int base,
> + unsigned long *res)
> +{
> + unsigned long long result;
> + unsigned int rv;
> +
> + cp = _parse_integer_fixup_radix(cp, &base);
> + rv = _parse_integer(cp, base, &result);
> + if ((rv & KSTRTOX_OVERFLOW) || (result != (unsigned long)result))
> + return -ERANGE;
> +
> + cp += rv;
> +
> + if (endp)
> + *endp = (char *)cp;
> +
> + *res = (unsigned long)result;
> + return 0;
> +}
> +
> #define TMPBUFLEN 22
> /**
> * proc_get_long - reads an ASCII formatted integer from a user buffer
> @@ -2108,7 +2145,8 @@ static int proc_get_long(char **buf, size_t *size,
> if (!isdigit(*p))
> return -EINVAL;
>
> - *val = simple_strtoul(p, &p, 0);
> + if (strtoul_lenient(p, &p, 0, val))
> + return -EINVAL;

Is it deliberate that on an error stroul_lenient returns -ERANGE but
then proc_get_long returns -EINVAL? That feels wrong. The write system
call does not permit -ERANGE or -EINVAL for the contents of the data
so both options appear equally bad from a standards point of view.

I am just wondering what the thinking is here.

> len = p - tmp;

2018-10-17 00:25:52

by Christian Brauner

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] sysctl: handle overflow in proc_get_long

On Wed, Oct 17, 2018 at 1:46 AM Eric W. Biederman <[email protected]> wrote:
>
> Christian Brauner <[email protected]> writes:
>
> > proc_get_long() is a funny function. It uses simple_strtoul() and for a
> > good reason. proc_get_long() wants to always succeed the parse and return
> > the maybe incorrect value and the trailing characters to check against a
> > pre-defined list of acceptable trailing values.
> > However, simple_strtoul() explicitly ignores overflows which can cause
> > funny things like the following to happen:
> >
> > echo 18446744073709551616 > /proc/sys/fs/file-max
> > cat /proc/sys/fs/file-max
> > 0
> >
> > (Which will cause your system to silently die behind your back.)
> >
> > On the other hand kstrtoul() does do overflow detection but does not return
> > the trailing characters, and also fails the parse when anything other than
> > '\n' is a trailing character whereas proc_get_long() wants to be more
> > lenient.
> >
> > Now, before adding another kstrtoul() function let's simply add a static
> > parse strtoul_lenient() which:
> > - fails on overflow with -ERANGE
> > - returns the trailing characters to the caller
> >
> > The reason why we should fail on ERANGE is that we already do a partial
> > fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we
> > already reject values such as 184467440737095516160 (21 chars) but accept
> > values such as 18446744073709551616 (20 chars) but both are overflows. So
> > we should just always reject 64bit overflows and not special-case this
> > based on the number of chars.
> >
> > Acked-by: Kees Cook <[email protected]>
> > Signed-off-by: Christian Brauner <[email protected]>
> > Signed-off-by: Christian Brauner <[email protected]>
> > ---
> > v2->v3:
> > - (Kees) s/#include <../lib/kstrtox.h>/#include "../lib/kstrtox.h"/g
> > - (Kees) document strtoul_lenient()
> >
> > v1->v2:
> > - s/sysctl_cap_erange/sysctl_lenient/g
> > - consistenly fail on overflow
> >
> > v0->v1:
> > - s/sysctl_strtoul_lenient/strtoul_cap_erange/g
> > - (Al) remove bool overflow return argument from strtoul_cap_erange
> > - (Al) return ULONG_MAX on ERANGE from strtoul_cap_erange
> > - (Dominik) fix spelling in commit message
> > ---
> > kernel/sysctl.c | 40 +++++++++++++++++++++++++++++++++++++++-
> > 1 file changed, 39 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> > index cc02050fd0c4..102aa7a65687 100644
> > --- a/kernel/sysctl.c
> > +++ b/kernel/sysctl.c
> > @@ -68,6 +68,8 @@
> > #include <linux/mount.h>
> > #include <linux/pipe_fs_i.h>
> >
> > +#include "../lib/kstrtox.h"
> > +
> > #include <linux/uaccess.h>
> > #include <asm/processor.h>
> >
> > @@ -2065,6 +2067,41 @@ static void proc_skip_char(char **buf, size_t *size, const char v)
> > }
> > }
> >
> > +/**
> > + * strtoul_lenient - parse an ASCII formatted integer from a buffer and only
> > + * fail on overflow
> > + *
> > + * @cp: kernel buffer containing the string to parse
> > + * @endp: pointer to store the trailing characters
> > + * @base: the base to use
> > + * @res: where the parsed integer will be stored
> > + *
> > + * In case of success 0 is returned and @res will contain the parsed integer,
> > + * @endp will hold any trailing characters.
> > + * This function will fail the parse on overflow. If there wasn't an overflow
> > + * the function will defer the decision what characters count as invalid to the
> > + * caller.
> > + */
> > +static int strtoul_lenient(const char *cp, char **endp, unsigned int base,
> > + unsigned long *res)
> > +{
> > + unsigned long long result;
> > + unsigned int rv;
> > +
> > + cp = _parse_integer_fixup_radix(cp, &base);
> > + rv = _parse_integer(cp, base, &result);
> > + if ((rv & KSTRTOX_OVERFLOW) || (result != (unsigned long)result))
> > + return -ERANGE;
> > +
> > + cp += rv;
> > +
> > + if (endp)
> > + *endp = (char *)cp;
> > +
> > + *res = (unsigned long)result;
> > + return 0;
> > +}
> > +
> > #define TMPBUFLEN 22
> > /**
> > * proc_get_long - reads an ASCII formatted integer from a user buffer
> > @@ -2108,7 +2145,8 @@ static int proc_get_long(char **buf, size_t *size,
> > if (!isdigit(*p))
> > return -EINVAL;
> >
> > - *val = simple_strtoul(p, &p, 0);
> > + if (strtoul_lenient(p, &p, 0, val))
> > + return -EINVAL;
>
> Is it deliberate that on an error stroul_lenient returns -ERANGE but
> then proc_get_long returns -EINVAL? That feels wrong. The write system

Yes, because all other integer parsing function return ERANGE as well.
Right now there are no other users but if someone wants to use that function
I wanted it to behave like the others.

> call does not permit -ERANGE or -EINVAL for the contents of the data
> so both options appear equally bad from a standards point of view.

Right, but if you write a value that exceeds the buffer of 22 chars that is used
to parse you already get EINVAL back on current kernels.
So it didn't feel like returning EOVERFLOW or ERANGE might make sense.
I saw a change in 4.10 or 4.11 as well that used EINVAL on UINT_MAX overflow
or something. EINVAL might be enough information for userspace here ?/.

>
> I am just wondering what the thinking is here.
>
> > len = p - tmp;

2018-10-17 00:36:35

by Al Viro

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote:
> Currently, when writing
>
> echo 18446744073709551616 > /proc/sys/fs/file-max
>
> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> crashes the system.
> This commit sets the max and min value for file-max and returns -EINVAL
> when a long int is exceeded. Any higher value cannot currently be used as
> the percpu counters are long ints and not unsigned integers. This behavior
> also aligns with other tuneables that return -EINVAL when their range is
> exceeded. See e.g. [1], [2] and others.

Mostly sane, but... get_max_files() users are bloody odd. The one in
file-max limit reporting looks like a half-arsed attempt in "[PATCH] fix
file counting". The one in af_unix.c, though... I don't remember how
that check had come to be - IIRC that was a strange fallout of a thread
with me, Andrea and ANK involved, circa 1999, but I don't remember details;
Andrea, any memories? It might be worth reconsidering... The change in
question is in 2.2.4pre6; what do we use unix_nr_socks for? We try to
limit the number of PF_UNIX socks by 2 * max_files, but max_files can be
huge *and* non-constant (i.e. it can decrease). What's more, unix_tot_inflight
is unsigned int and max_files might exceed 2^31 just fine since "fs: allow
for more than 2^31 files" back in 2010... Something's fishy there...

2018-10-17 02:20:29

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v3 1/2] sysctl: handle overflow in proc_get_long

On Tue, Oct 16, 2018 at 5:24 PM, Christian Brauner <[email protected]> wrote:
> Right, but if you write a value that exceeds the buffer of 22 chars that is used
> to parse you already get EINVAL back on current kernels.
> So it didn't feel like returning EOVERFLOW or ERANGE might make sense.
> I saw a change in 4.10 or 4.11 as well that used EINVAL on UINT_MAX overflow
> or something. EINVAL might be enough information for userspace here ?/.

I'd agree: I think there is more precedent for EINVAL.

-Kees

--
Kees Cook
Pixel Security

2018-10-17 09:59:27

by Christian Brauner

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

On Wed, Oct 17, 2018 at 01:35:48AM +0100, Al Viro wrote:
> On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote:
> > Currently, when writing
> >
> > echo 18446744073709551616 > /proc/sys/fs/file-max
> >
> > /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> > crashes the system.
> > This commit sets the max and min value for file-max and returns -EINVAL
> > when a long int is exceeded. Any higher value cannot currently be used as
> > the percpu counters are long ints and not unsigned integers. This behavior
> > also aligns with other tuneables that return -EINVAL when their range is
> > exceeded. See e.g. [1], [2] and others.
>
> Mostly sane, but... get_max_files() users are bloody odd. The one in
> file-max limit reporting looks like a half-arsed attempt in "[PATCH] fix
> file counting". The one in af_unix.c, though... I don't remember how
> that check had come to be - IIRC that was a strange fallout of a thread
> with me, Andrea and ANK involved, circa 1999, but I don't remember details;
> Andrea, any memories? It might be worth reconsidering... The change in
> question is in 2.2.4pre6; what do we use unix_nr_socks for? We try to
> limit the number of PF_UNIX socks by 2 * max_files, but max_files can be

So that's something I mentioned to Kees before. It seems we should
either simply replace this check with:

if ((atomic_long_read(&unix_nr_socks) >> 1) > get_max_files())
goto out;

to protect against overflows or simply do

if (atomic_long_read(&unix_nr_socks) > get_max_files())
goto out;

> huge *and* non-constant (i.e. it can decrease). What's more, unix_tot_inflight
> is unsigned int and max_files might exceed 2^31 just fine since "fs: allow
> for more than 2^31 files" back in 2010... Something's fishy there...

What's more is that fs/file_table.c:files_maxfiles_init()
currently has:
void __init files_maxfiles_init(void)
{
unsigned long n;
unsigned long memreserve = (totalram_pages - nr_free_pages()) * 3/2;

memreserve = min(memreserve, totalram_pages - 1);
n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;

files_stat.max_files = max_t(unsigned long, n, NR_FILE);
}

given that we currently can't handle more than LONG_MAX files should we
maybe cap here? Like:

diff --git a/fs/file_table.c b/fs/file_table.c
index e49af4caf15d..dd108b4c6d72 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -376,6 +376,8 @@ void __init files_init(void)
/*
* One file with associated inode and dcache is very roughly 1K. Per default
* do not use more than 10% of our memory for files.
+ * The percpu counters only handle long ints so cap maximum number of
+ * files at LONG_MAX.
*/
void __init files_maxfiles_init(void)
{
@@ -386,4 +388,7 @@ void __init files_maxfiles_init(void)
n = ((totalram_pages - memreserve) * (PAGE_SIZE / 1024)) / 10;

files_stat.max_files = max_t(unsigned long, n, NR_FILE);
+
+ if (files_stat.max_files > LONG_MAX)
+ files_stat.max_files = LONG_MAX;
}

2018-10-18 22:00:48

by Andrea Arcangeli

[permalink] [raw]

Subject: Re: [PATCH v3 2/2] sysctl: handle overflow for file-max

Hi Al,

On Wed, Oct 17, 2018 at 01:35:48AM +0100, Al Viro wrote:
> On Wed, Oct 17, 2018 at 12:33:22AM +0200, Christian Brauner wrote:
> > Currently, when writing
> >
> > echo 18446744073709551616 > /proc/sys/fs/file-max
> >
> > /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> > crashes the system.
> > This commit sets the max and min value for file-max and returns -EINVAL
> > when a long int is exceeded. Any higher value cannot currently be used as
> > the percpu counters are long ints and not unsigned integers. This behavior
> > also aligns with other tuneables that return -EINVAL when their range is
> > exceeded. See e.g. [1], [2] and others.
>
> Mostly sane, but... get_max_files() users are bloody odd. The one in
> file-max limit reporting looks like a half-arsed attempt in "[PATCH] fix
> file counting". The one in af_unix.c, though... I don't remember how
> that check had come to be - IIRC that was a strange fallout of a thread
> with me, Andrea and ANK involved, circa 1999, but I don't remember details;
> Andrea, any memories? It might be worth reconsidering... The change in
> question is in 2.2.4pre6; what do we use unix_nr_socks for? We try to
> limit the number of PF_UNIX socks by 2 * max_files, but max_files can be
> huge *and* non-constant (i.e. it can decrease). What's more, unix_tot_inflight
> is unsigned int and max_files might exceed 2^31 just fine since "fs: allow
> for more than 2^31 files" back in 2010... Something's fishy there...

Feels like a lifetime ago :), but looking into I remembered some of
it. That thread was about some instability in unix sockets for an
unrelated bug in the garbage collector. While reviewing your fix, I
probably incidentally found a resource exhaustion problem in doing a
connect();close() loop on a listening stream af_unix. I found an
exploit somewhere in my home dated in 99 in ls -l. Then ANK found
another resource exhaustion by sending datagram sockets, which I also
found an exploit for in my home.

ANK pointed out that a connect syscall allocates two sockets, one to
be accepted by the listening process, the other is the connect
itself. That must be the explanation of the "*2".

The "max_files*2" is probably a patch was from you (which was not
overflowing back then), in attempt to fix the garbage collector issue
which initially looked like resource exhaustion.

I may have suggested to check sk_max_ack_backlog and fail connect() in
such case to solve the resource exhaustion, but my proposal was
obviously broken because connect() would return an error when the
backlog was full and I suppose I didn't implement anything like
unix_wait_for_peer. So I guess (not 100% sure) the get_max_files()*2
check stayed, not because of the bug in the garbage collector that was
fixed independently, but as a stop gap measure for the
connect();close() loop resource exhaustion.

I tried the exploit that does a connect();close() in a loop and it
gracefully hangs in unix_wait_for_peer() after sk_max_ack_backlog
connects.

Out of curiosity I tried also the dgram exploit and it hangs in
sock_alloc_send_pskb with sk_wmem_alloc_get(sk) < sk->sk_sndbuf
check. The file*2 limit couldn't have helped that one anyway.

If I set /proc/sys/net/core/somaxconn to 1000000 the exploit works
fine again and the connect;close loop again allocates infinite amount
of kernel RAM into a tiny RSS process and it triggered OOM (there was
no OOM killer in v2.2 I suppose). By default it's 128. There's also
sysctl_max_dgram_qlen for dgram that on Android is set to 600 (by
default 10).

I tend to think these resources are now capped by other means (notably
somaxconn, sysctl_max_dgram_qlen, sk_wmem_alloc_get) and unix_nr_socks
can be dropped. Or if that atomic counter is still needed it's not for
a trivial exploit anymore than just does listen(); SIGSTOP() from one
process and a connect();close() loop in another process. It'd require
more than a listening socket and heavily forking or a large increase
on the max number of file descriptors (a privileged op) to do a ton of
listens, but forking has its own memory footprint in userland too. At
the very least it should be a per-cpu counter synced to the atomic
global after a threshold.

The other reason for dropping is that it wasn't ideal that the trivial
exploit could still allocated max_files*2 SYN skbs with a loop of
connect;close, max_files*2 is too much already so I suppose it was
only a stop-gap measure to begin with.

Thanks,
Andrea

2018-10-29 14:59:15

by Christian Brauner

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] sysctl: handle overflow for file-max

On Wed, Oct 17, 2018 at 12:33:20AM +0200, Christian Brauner wrote:
> Hey,
>
> Here is v3 of this patchset. Changelogs are in the individual commits.
>
> Currently, when writing
>
> echo 18446744073709551616 > /proc/sys/fs/file-max
>
> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> crashes the system.
>
> The first version of this patch intended to detect the overflow and cap
> at ULONG_MAX. However, we should not do this and rather return EINVAL on
> overflow. The reasons are:
> - this aligns with other sysctl handlers that simply reject overflows
> (cf. [1], [2], and a bunch of others)
> - we already do a partial fail on overflow right now
> Namely, when the TMPBUFLEN is exceeded. So we already reject values
> such as 184467440737095516160 (21 chars) but accept values such as
> 18446744073709551616 (20 chars) but both are overflows. So we should
> just always reject 64bit overflows and not special-case this based on
> the number of chars.
>
> (This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)

Just so that we don't forget, can we make sure that this gets picked
into linux-next? :)

Christian

>
> Thanks!
> Christian
>
> [1]: fb910c42cceb ("sysctl: check for UINT_MAX before unsigned int min/max")
> [2]: 196851bed522 ("s390/topology: correct topology mode proc handler")
>
> Christian Brauner (2):
> sysctl: handle overflow in proc_get_long
> sysctl: handle overflow for file-max
>
> kernel/sysctl.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 46 insertions(+), 1 deletion(-)
>
> --
> 2.17.1
>

2018-10-29 21:45:19

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] sysctl: handle overflow for file-max

On Mon, Oct 29, 2018 at 7:58 AM, Christian Brauner <[email protected]> wrote:
> On Wed, Oct 17, 2018 at 12:33:20AM +0200, Christian Brauner wrote:
>> Hey,
>>
>> Here is v3 of this patchset. Changelogs are in the individual commits.
>>
>> Currently, when writing
>>
>> echo 18446744073709551616 > /proc/sys/fs/file-max
>>
>> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
>> crashes the system.
>>
>> The first version of this patch intended to detect the overflow and cap
>> at ULONG_MAX. However, we should not do this and rather return EINVAL on
>> overflow. The reasons are:
>> - this aligns with other sysctl handlers that simply reject overflows
>> (cf. [1], [2], and a bunch of others)
>> - we already do a partial fail on overflow right now
>> Namely, when the TMPBUFLEN is exceeded. So we already reject values
>> such as 184467440737095516160 (21 chars) but accept values such as
>> 18446744073709551616 (20 chars) but both are overflows. So we should
>> just always reject 64bit overflows and not special-case this based on
>> the number of chars.
>>
>> (This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)
>
> Just so that we don't forget, can we make sure that this gets picked
> into linux-next? :)

I was hoping akpm would take this? Andrew, does the v3 look okay to you?

-Kees

--
Kees Cook

2018-12-09 16:47:45

by Christian Brauner

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] sysctl: handle overflow for file-max

On Mon, Oct 29, 2018 at 10:44 PM Kees Cook <[email protected]> wrote:
>
> On Mon, Oct 29, 2018 at 7:58 AM, Christian Brauner <[email protected]> wrote:
> > On Wed, Oct 17, 2018 at 12:33:20AM +0200, Christian Brauner wrote:
> >> Hey,
> >>
> >> Here is v3 of this patchset. Changelogs are in the individual commits.
> >>
> >> Currently, when writing
> >>
> >> echo 18446744073709551616 > /proc/sys/fs/file-max
> >>
> >> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> >> crashes the system.
> >>
> >> The first version of this patch intended to detect the overflow and cap
> >> at ULONG_MAX. However, we should not do this and rather return EINVAL on
> >> overflow. The reasons are:
> >> - this aligns with other sysctl handlers that simply reject overflows
> >> (cf. [1], [2], and a bunch of others)
> >> - we already do a partial fail on overflow right now
> >> Namely, when the TMPBUFLEN is exceeded. So we already reject values
> >> such as 184467440737095516160 (21 chars) but accept values such as
> >> 18446744073709551616 (20 chars) but both are overflows. So we should
> >> just always reject 64bit overflows and not special-case this based on
> >> the number of chars.
> >>
> >> (This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)
> >
> > Just so that we don't forget, can we make sure that this gets picked
> > into linux-next? :)
>
> I was hoping akpm would take this? Andrew, does the v3 look okay to you?

gentle ping again :)

Christian

2018-12-10 19:47:50

by Kees Cook

[permalink] [raw]

Subject: Re: [PATCH v3 0/2] sysctl: handle overflow for file-max

Hi Andrew,

Can you take this patch for -mm?

-Kees

On Sun, Dec 9, 2018 at 8:41 AM Christian Brauner
<[email protected]> wrote:
>
> On Mon, Oct 29, 2018 at 10:44 PM Kees Cook <[email protected]> wrote:
> >
> > On Mon, Oct 29, 2018 at 7:58 AM, Christian Brauner <[email protected]> wrote:
> > > On Wed, Oct 17, 2018 at 12:33:20AM +0200, Christian Brauner wrote:
> > >> Hey,
> > >>
> > >> Here is v3 of this patchset. Changelogs are in the individual commits.
> > >>
> > >> Currently, when writing
> > >>
> > >> echo 18446744073709551616 > /proc/sys/fs/file-max
> > >>
> > >> /proc/sys/fs/file-max will overflow and be set to 0. That quickly
> > >> crashes the system.
> > >>
> > >> The first version of this patch intended to detect the overflow and cap
> > >> at ULONG_MAX. However, we should not do this and rather return EINVAL on
> > >> overflow. The reasons are:
> > >> - this aligns with other sysctl handlers that simply reject overflows
> > >> (cf. [1], [2], and a bunch of others)
> > >> - we already do a partial fail on overflow right now
> > >> Namely, when the TMPBUFLEN is exceeded. So we already reject values
> > >> such as 184467440737095516160 (21 chars) but accept values such as
> > >> 18446744073709551616 (20 chars) but both are overflows. So we should
> > >> just always reject 64bit overflows and not special-case this based on
> > >> the number of chars.
> > >>
> > >> (This patchset is in reference to https://lkml.org/lkml/2018/10/11/585.)
> > >
> > > Just so that we don't forget, can we make sure that this gets picked
> > > into linux-next? :)
> >
> > I was hoping akpm would take this? Andrew, does the v3 look okay to you?
>
> gentle ping again :)
>
> Christian

--
Kees Cook