2000-12-07 12:45:35

by Szabolcs Szakacsits

[permalink] [raw]
Subject: [PATCH] Broken NR_RESERVED_FILES


Reserved fd's for superuser doesn't work. Patch for 2.2 is below,
kernel 2.4.x also has this problem, fix is similar. The default
NR_RESERVED_FILES value also had to be increased (e.g. ssh, login
needs 36, ls 16, man 45 fd's, etc).

BTW, I have an updated version of my reserved VM for superuser +
improved/fixed version of Rik's out of memory killer patch for 2.2
here,

http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_vm+oom_killer-5.diff

It fixes the potential deadlock when kernel threads were blocked to
try to free pages - more details about the patch are in a former
email, http://boudicca.tux.org/hypermail/linux-kernel/2000week48/0624.html

Szaka

diff -ur linux-2.2.18pre21/fs/file_table.c linux/fs/file_table.c
--- linux-2.2.18pre21/fs/file_table.c Tue Jan 4 13:12:23 2000
+++ linux/fs/file_table.c Thu Dec 7 13:26:06 2000
@@ -71,30 +71,27 @@
{
static int old_max = 0;
struct file * f;
+ int total_free;

- if (nr_free_files > NR_RESERVED_FILES) {
- used_one:
- f = free_filps;
- remove_filp(f);
- nr_free_files--;
- new_one:
- memset(f, 0, sizeof(*f));
- f->f_count = 1;
- f->f_version = ++global_event;
- f->f_uid = current->fsuid;
- f->f_gid = current->fsgid;
- put_inuse(f);
- return f;
- }
- /*
- * Use a reserved one if we're the superuser
- */
- if (nr_free_files && !current->euid)
- goto used_one;
- /*
- * Allocate a new one if we're below the limit.
- */
- if (nr_files < max_files) {
+ total_free = max_files - nr_files + nr_free_files;
+ if (total_free > NR_RESERVED_FILES || (total_free && !current->euid)) {
+ if (nr_free_files) {
+ used_one:
+ f = free_filps;
+ remove_filp(f);
+ nr_free_files--;
+ new_one:
+ memset(f, 0, sizeof(*f));
+ f->f_count = 1;
+ f->f_version = ++global_event;
+ f->f_uid = current->fsuid;
+ f->f_gid = current->fsgid;
+ put_inuse(f);
+ return f;
+ }
+ /*
+ * Allocate a new one if we're below the limit.
+ */
f = kmem_cache_alloc(filp_cache, SLAB_KERNEL);
if (f) {
nr_files++;
diff -ur linux-2.2.18pre21/include/linux/fs.h linux/include/linux/fs.h
--- linux-2.2.18pre21/include/linux/fs.h Thu Nov 9 08:20:18 2000
+++ linux/include/linux/fs.h Thu Dec 7 11:10:50 2000
@@ -51,7 +51,7 @@
extern int max_super_blocks, nr_super_blocks;

#define NR_FILE 4096 /* this can well be larger on a larger system */
-#define NR_RESERVED_FILES 10 /* reserved for root */
+#define NR_RESERVED_FILES 96 /* reserved for root */
#define NR_SUPER 256

#define MAY_EXEC 1


2000-12-07 13:22:11

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES

On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> Reserved fd's for superuser doesn't work.

It does actually work, but remember that the concept of "reserved file
structures for superuser" is defined as "file structures to be taken from
the freelist" whereas your patch below:

> + total_free = max_files - nr_files + nr_free_files;
> + if (total_free > NR_RESERVED_FILES || (total_free && !current->euid)) {
> + if (nr_free_files) {
> + used_one:
> + f = free_filps;
> + remove_filp(f);
> + nr_free_files--;
> + new_one:
> + memset(f, 0, sizeof(*f));
> + f->f_count = 1;
> + f->f_version = ++global_event;
> + f->f_uid = current->fsuid;
> + f->f_gid = current->fsgid;
> + put_inuse(f);
> + return f;
> + }
> + /*
> + * Allocate a new one if we're below the limit.
> + */
> f = kmem_cache_alloc(filp_cache, SLAB_KERNEL);

allows one to allocate a file structure from the filp_cache slab cache if
one is a superuser. That is not a "fix" -- it is a serious re-definition
of semantics of 'freelist'. There are even books (Understanding the Linux
Kernel by Bovet et all) which describe this freelist in the current
context so your patch will require updates to the books.

Having said that, it is probably a good idea to allow superuser to
allocate new file structures... just remember that you have just
_rewritten_ get_empty_filp()'s rules completely and not "fixed" them.

Regards,
Tigran

2000-12-07 13:35:26

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES

Hi,

Another comment on your patch. You removed the goto used_one (probably a
good idea, I hated it as well and preferred to put it into the if()) but
you forgot to remove the label itself.

Regards,
Tigran


2000-12-07 14:03:11

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES


On Thu, 7 Dec 2000, Tigran Aivazian wrote:
> On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > Reserved fd's for superuser doesn't work.
> It does actually work,

What do you mean under "work"? I meant user apps are able to
exhaust fd's completely and none is left for superuser.

> but remember that the concept of "reserved file
> structures for superuser" is defined as "file structures to be taken from
> the freelist"

Yes, in this sense it works and it's also very close to unhelpfulness.

> whereas your patch below:
[...]
> allows one to allocate a file structure from the filp_cache slab cache if
> one is a superuser.

Or one is user and didn't hit yet the reserved fd's (and of course
superuser aren't able to allocate more then max_files).

Szaka

2000-12-07 14:31:48

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES

On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> On Thu, 7 Dec 2000, Tigran Aivazian wrote:
> > On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > > Reserved fd's for superuser doesn't work.
> > It does actually work,
>
> What do you mean under "work"? I meant user apps are able to
> exhaust fd's completely and none is left for superuser.

really? how did you manage to do that? On a 2.4.0-test12-pre7 system I
cannot reproduce the behaviour you describe. I.e. at least
NR_RESERVED_FILES (which I agree with you should be increased!) are left
to superuser processes whilst the user processes are denied.

How exactly are you reproducing this? I wrote a simple while (1)
{open("/dev/null", 0); } program and run many instances of it as user. At
some stage user starts failing but superuser happily draws from the
freelist. Of course, the superuser cannot start complex programs which
require many descriptos but that is entirely the issues of
NR_RESERVED_FILES being too small on which I totally agree with you.

Regards,
Tigran

2000-12-07 15:27:29

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES


On Thu, 7 Dec 2000, Tigran Aivazian wrote:
> On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > On Thu, 7 Dec 2000, Tigran Aivazian wrote:
> > > On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > > > Reserved fd's for superuser doesn't work.
> > > It does actually work,
> >
> > What do you mean under "work"? I meant user apps are able to
> > exhaust fd's completely and none is left for superuser.
>
> really? how did you manage to do that? On a 2.4.0-test12-pre7 system I

Just as you but I think you are testing on a margin condition [when
all file structure were already allocated], just reboot and try it
again. The failed logic is also clear from the kernel code [user
happily allocates when freelist < NR_RESERVED_FILES].

Note, I tested only 2.2 but 2.4 apparently also has the same bad
logic. Maybe with 2.4 you are also just lucky because of several other
reasons [e.g. kernel tries harder to keep freelist high and you also
have the fast enough box for it, etc].

> cannot reproduce the behaviour you describe. I.e. at least
> NR_RESERVED_FILES (which I agree with you should be increased!) are left
> to superuser processes whilst the user processes are denied.
>
> How exactly are you reproducing this? I wrote a simple while (1)
> {open("/dev/null", 0); } program and run many instances of it as user. At
> some stage user starts failing but superuser happily draws from the
> freelist. Of course, the superuser cannot start complex programs which
> require many descriptos but that is entirely the issues of

Well, about maybe 'dmesg' will work, not others for the current
default value (10). Note, the number of fd's aren't the *currently*
used one but apparently about all of them - even if they were closed
after the open - during the existence of the app [probably buffering
and maybe this changed in 2.4].

> NR_RESERVED_FILES being too small on which I totally agree with you.

Yep, with it's current value it's just a code decoration without any
gains.

Szaka

2000-12-07 16:11:58

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES

On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> again. The failed logic is also clear from the kernel code [user
> happily allocates when freelist < NR_RESERVED_FILES].

is it clear to you? it is not clear to me, or rather the opposite seems
clear. This is what the code looks like (in 2.4):

struct file * get_empty_filp(void)
{
static int old_max = 0;
struct file * f;

file_list_lock();
if (files_stat.nr_free_files > NR_RESERVED_FILES) {
used_one:
f = list_entry(free_list.next, struct file, f_list);
list_del(&f->f_list);
files_stat.nr_free_files--;

so, a normal user is only allowed to allocate from the freelist when the
number of elements on the freelist is > NR_RESERVED_FILES. I do not see
how you are able to take elements from the freelist when the number is <
NR_RESERVED_FILES unless you are a super-user, i.e. current->euid == 0.

Btw, while you are there (in 2.2 kernel) you may want to fix the
/proc/sys/fs/file-nr -- it is broken because it assumes that nr_files,
nr_free_files, max_files are allocated by the compiler at consecutive
addresses. I have fixed this for 2.4 ages ago but couldn't be bothered
with 2.2.x...

Regards,
Tigran


2000-12-07 16:33:24

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES


On Thu, 7 Dec 2000, Tigran Aivazian wrote:
> On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > again. The failed logic is also clear from the kernel code [user
> > happily allocates when freelist < NR_RESERVED_FILES].
>
> is it clear to you? it is not clear to me, or rather the opposite seems
> clear. This is what the code looks like (in 2.4):
>
> struct file * get_empty_filp(void)
> {
> static int old_max = 0;
> struct file * f;
>
> file_list_lock();
> if (files_stat.nr_free_files > NR_RESERVED_FILES) {
> used_one:
> f = list_entry(free_list.next, struct file, f_list);
> list_del(&f->f_list);
> files_stat.nr_free_files--;
>
> so, a normal user is only allowed to allocate from the freelist when the
> number of elements on the freelist is > NR_RESERVED_FILES. I do not see
> how you are able to take elements from the freelist when the number is <
> NR_RESERVED_FILES unless you are a super-user, i.e. current->euid == 0.

Read the whole get_empty_filp function, especially this part, note the
goto new_one below and the part you didn't include above [from
the new_one label],

if (files_stat.nr_files < files_stat.max_files) {
file_list_unlock();
f = kmem_cache_alloc(filp_cachep, SLAB_KERNEL);
file_list_lock();
if (f) {
files_stat.nr_files++;
goto new_one;
}

> Btw, while you are there (in 2.2 kernel) you may want to fix the

Sorry, no time.

Szaka

2000-12-07 16:40:24

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES

On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> Read the whole get_empty_filp function, especially this part, note the
> goto new_one below and the part you didn't include above [from
> the new_one label],
>
> if (files_stat.nr_files < files_stat.max_files) {
> file_list_unlock();
> f = kmem_cache_alloc(filp_cachep, SLAB_KERNEL);
> file_list_lock();
> if (f) {
> files_stat.nr_files++;
> goto new_one;
> }

I have read the whole function, including the above code, of course. The
new_one label has nothing to do with freelists -- it adds the file to the
anon_list, where the new arrivales from the slab cache go. The goto
new_one above is there simply to initialize the structure with sane
initial values

So, the normal user _cannot_ take a file structure from the freelist
unless it contains more than NR_RESERVED_FILE entries. Please read the
whole function and see it for yourself.

Regards,
Tigran

2000-12-07 17:03:35

by Szabolcs Szakacsits

[permalink] [raw]
Subject: Re: [PATCH] Broken NR_RESERVED_FILES


On Thu, 7 Dec 2000, Tigran Aivazian wrote:

> On Thu, 7 Dec 2000, Szabolcs Szakacsits wrote:
> > Read the whole get_empty_filp function, especially this part, note the
> > goto new_one below and the part you didn't include above [from
> > the new_one label],
> >
> > if (files_stat.nr_files < files_stat.max_files) {
> > file_list_unlock();
> > f = kmem_cache_alloc(filp_cachep, SLAB_KERNEL);
> > file_list_lock();
> > if (f) {
> > files_stat.nr_files++;
> > goto new_one;
> > }
>
> I have read the whole function, including the above code, of course. The
> new_one label has nothing to do with freelists -- it adds the file to the
> anon_list, where the new arrivales from the slab cache go. The goto
> new_one above is there simply to initialize the structure with sane
> initial values

OK, 2.2 has

put_inuse(f);

instead of putting it to anon_list, so 2.4 seems ok.

Szaka

> So, the normal user _cannot_ take a file structure from the freelist
> unless it contains more than NR_RESERVED_FILE entries. Please read the
> whole function and see it for yourself.
>
> Regards,
> Tigran
>
>