2001-10-19 04:18:56

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.13pre5aa1

The vm part in particular is right now getting stressed on a 16G box kindly
provided by osdlab.org and it didn't exibith any problem yet. This is a trace
of the workload that is running on the machine overnight.

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 3 2 7055840 5592 196 4208 681 903 687 904 56 87 0 5 95
0 3 0 7055332 4956 184 4196 5892 4820 5892 4832 418 547 0 3 97
0 3 0 7056004 5816 176 4184 6172 6400 6172 6400 418 579 0 4 96
0 3 0 7055912 5688 192 4184 4720 4096 4736 4112 355 456 0 2 98
0 3 1 7055852 5300 180 4220 5624 5068 5720 5072 408 526 0 2 98
0 3 0 7055992 5228 176 4220 6384 5744 6384 5744 427 586 0 1 99
0 3 0 7055900 5232 176 4220 6016 5676 6016 5676 417 545 0 2 98
0 3 0 7056396 5844 180 4220 5644 5656 5656 5660 402 560 0 1 99
0 3 1 7056476 6012 176 4216 6104 6144 6104 6144 411 582 0 1 99
0 3 0 7056084 5592 176 4220 5540 4452 5540 4452 386 525 0 1 98
0 3 0 7055948 5400 176 4220 5184 4724 5184 4724 355 519 0 2 98
0 3 0 7056676 6136 176 4232 7360 7592 7360 7592 519 720 0 1 98
0 4 0 7056264 5572 176 4240 5888 5112 5908 5112 411 535 0 1 99
0 3 1 7055948 5444 180 4240 5632 4912 5656 4932 402 605 0 1 99
0 3 0 7055780 5088 176 4240 4932 4276 4932 4276 350 432 0 1 99
0 3 0 7055612 5128 176 4240 4564 4252 4564 4252 340 434 0 0 100

total used free shared buffers cached
Mem: 16493180 16488716 4464 0 184 4260
-/+ buffers/cache: 16484272 8908
Swap: 36941584 7054904 29886680

It seems still very responsive despite of the load (and also despite being at
the other side of the Atlantic :).

URL:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.13pre5aa1.bz2
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.13pre5aa1/

If it swaps too much please try:

echo 6 > /proc/sys/vm/vm_scan_ratio
echo 2 > /proc/sys/vm/vm_mapped_ratio
echo 4 > /proc/sys/vm/vm_balance_ratio

Thanks!

--
Only in 2.4.13pre5aa1: 00_alpha-rest-pci-1
Only in 2.4.13pre5aa1: 00_alpha-tsunami-1

iommu alpha fixes from Jay Estabrook, Ivan Kokshaysky and Richard
Henderson.

Only in 2.4.13pre3aa1: 00_files_struct_rcu-2.4.10-04-1
Only in 2.4.13pre5aa1: 00_files_struct_rcu-2.4.10-04-2

Latest uptdate from Maneesh Soni including the memalloc faliure bugfix
Chip Salzenberg.

Only in 2.4.13pre3aa1: 00_highmem-deadlock-1
Only in 2.4.13pre5aa1: 00_highmem-deadlock-2

Rediffed so that it's self contained (previously it wasn't very
readable).

Only in 2.4.13pre3aa1: 00_lowlatency-fixes-1
Only in 2.4.13pre5aa1: 00_lowlatency-fixes-2

Added a reschedule point, mainly for madvise but it's in the ->nopage
way too.

Only in 2.4.13pre3aa1: 00_seg-reload-1

Dropped, the common case is user<->kernel, and it have to be very fast
too since it's more important. (idling routers would better not use
irq at all but to dedicate a cpu to the polling work)

Only in 2.4.13pre3aa1: 00_vm-3
Only in 2.4.13pre3aa1: 00_vm-3.2
Only in 2.4.13pre5aa1: 10_vm-4

Further vm work. Not sure if this is better than the previous one, but
now the few magic numbers are sysctl configurable. Included many fixes
from Linus (that are also in pre5 of course) and one fix from Manfred
to avoid interrupt to eat the pfmemalloc reserved pool.

Only in 2.4.13pre3aa1: 10_highmem-debug-5
Only in 2.4.13pre5aa1: 20_highmem-debug-6

Fixed zone alignment to avoid crashes when highmem emulation is enabled.

Only in 2.4.13pre5aa1: 10_lvm-deadlock-fix-1

Dropped sync_dev from blkdev ->close to avoid a deadlock on lvm close(2),
fix from Alexander Viro.

Only in 2.4.13pre3aa1: 60_tux-2.4.10-ac12-H1.bz2
Only in 2.4.13pre5aa1: 60_tux-2.4.10-ac12-H8.bz2

Latest update from Ingo Molnar at http://www.redhat.com/~mingo/ .
--

Andrea


2001-10-19 05:47:40

by Robert Love

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

On Fri, 2001-10-19 at 00:19, Andrea Arcangeli wrote:
> Only in 2.4.13pre3aa1: 00_files_struct_rcu-2.4.10-04-1
> Only in 2.4.13pre5aa1: 00_files_struct_rcu-2.4.10-04-2

I want to point out to preempt-kernel users that RCU is not
preempt-safe. The implicit locking assumed from per-CPU data structures
is defeated by preemptibility.

(Actually, FWIW, I think I can think of ways to make RCU preemptible but
it would involve changing the write-side quiescent code for the case
where the pointers were carried over the task switches. Probably not
worth it.)

This is not to say RCU is worthless with a preemptible kernel, but that
we need to make it safe (and then make sure it is still a performance
advantage, but I don't think this would add much overhead). Note this
is clean, simply wrapping the read code in non-preemption statements.

I will hack up a patch when I get the time, but I would like to prevent
myself from maintaining the patch against a third tree ... where, oh
where, is 2.5? :)

Robert Love

2001-10-19 07:09:59

by Dipankar Sarma

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

In article <1003470485.913.13.camel@phantasy> Robert Love wrote:
> On Fri, 2001-10-19 at 00:19, Andrea Arcangeli wrote:
>> Only in 2.4.13pre3aa1: 00_files_struct_rcu-2.4.10-04-1
>> Only in 2.4.13pre5aa1: 00_files_struct_rcu-2.4.10-04-2

> I want to point out to preempt-kernel users that RCU is not
> preempt-safe. The implicit locking assumed from per-CPU data structures
> is defeated by preemptibility.

> (Actually, FWIW, I think I can think of ways to make RCU preemptible but
> it would involve changing the write-side quiescent code for the case
> where the pointers were carried over the task switches. Probably not
> worth it.)

I agree. Differentiating between context switches that do or don't
carry over pointers requires several additional complications
that are probably not worth it at this moment.


> This is not to say RCU is worthless with a preemptible kernel, but that
> we need to make it safe (and then make sure it is still a performance
> advantage, but I don't think this would add much overhead). Note this
> is clean, simply wrapping the read code in non-preemption statements.

Yes. The lookup of data protected by RCU should be done with preemption
disabled.

preempt_disable();
traverse linked list or such things protected by RCU.
preempt_enable();

Thanks
Dipankar
--
Dipankar Sarma <[email protected]> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.

2001-10-20 14:02:48

by Maneesh Soni

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1


In article <[email protected]> you wrote:

> Only in 2.4.13pre3aa1: 00_files_struct_rcu-2.4.10-04-1
> Only in 2.4.13pre5aa1: 00_files_struct_rcu-2.4.10-04-2

Hello Andrea,

Please apply the following update for the rcu fd patch.
This has fixes for two more bugs pointed by Dipankar.

1. fs/file.c
in expand_fd_array new_fds is not freed if allocation for arg fails.

2. fs/file.c
kmalloc for arg instead of *arg in expand_fd_array and expand_fdset

Thank you,
Maneesh

--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5262355 Extn. 3999 email: [email protected]
http://lse.sourceforge.net/locking/rcupdate.html

2001-10-20 14:10:58

by Maneesh Soni

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1



> Please apply the following update for the rcu fd patch.

oops forgot the patch..


--
Maneesh Soni
IBM Linux Technology Center,
IBM India Software Lab, Bangalore.
Phone: +91-80-5262355 Extn. 3999 email: [email protected]
http://lse.sourceforge.net/locking/rcupdate.html


diff -urN linux-2.4.13pre5/drivers/char/tty_io.c linux-2.4.13pre5-fs/drivers/char/tty_io.c
--- linux-2.4.13pre5/drivers/char/tty_io.c Sun Sep 23 00:21:43 2001
+++ linux-2.4.13pre5-fs/drivers/char/tty_io.c Sat Oct 20 18:39:44 2001
@@ -1847,7 +1847,6 @@
}
task_lock(p);
if (p->files) {
- read_lock(&p->files->file_lock);
for (i=0; i < p->files->max_fds; i++) {
filp = fcheck_files(p->files, i);
if (filp && (filp->f_op == &tty_fops) &&
@@ -1856,7 +1855,6 @@
break;
}
}
- read_unlock(&p->files->file_lock);
}
task_unlock(p);
}
diff -urN linux-2.4.13pre5/fs/exec.c linux-2.4.13pre5-fs/fs/exec.c
--- linux-2.4.13pre5/fs/exec.c Wed Sep 19 02:09:32 2001
+++ linux-2.4.13pre5-fs/fs/exec.c Sat Oct 20 18:39:44 2001
@@ -482,7 +482,7 @@
{
long j = -1;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
for (;;) {
unsigned long set, i;

@@ -494,16 +494,16 @@
if (!set)
continue;
files->close_on_exec->fds_bits[j] = 0;
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
for ( ; set ; i++,set >>= 1) {
if (set & 1) {
sys_close(i);
}
}
- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);

}
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
}

/*
diff -urN linux-2.4.13pre5/fs/fcntl.c linux-2.4.13pre5-fs/fs/fcntl.c
--- linux-2.4.13pre5/fs/fcntl.c Tue Sep 18 01:46:30 2001
+++ linux-2.4.13pre5-fs/fs/fcntl.c Sat Oct 20 18:39:44 2001
@@ -64,7 +64,7 @@
int error;
int start;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);

repeat:
/*
@@ -110,7 +110,7 @@
{
FD_SET(fd, files->open_fds);
FD_CLR(fd, files->close_on_exec);
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
fd_install(fd, file);
}

@@ -126,7 +126,7 @@
return ret;

out_putf:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
fput(file);
return ret;
}
@@ -137,7 +137,7 @@
struct file * file, *tofree;
struct files_struct * files = current->files;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
if (!(file = fcheck(oldfd)))
goto out_unlock;
err = newfd;
@@ -168,7 +168,7 @@
files->fd[newfd] = file;
FD_SET(newfd, files->open_fds);
FD_CLR(newfd, files->close_on_exec);
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);

if (tofree)
filp_close(tofree, files);
@@ -176,11 +176,11 @@
out:
return err;
out_unlock:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
goto out;

out_fput:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
fput(file);
goto out;
}
diff -urN linux-2.4.13pre5/fs/file.c linux-2.4.13pre5-fs/fs/file.c
--- linux-2.4.13pre5/fs/file.c Sat Feb 10 00:59:44 2001
+++ linux-2.4.13pre5-fs/fs/file.c Sat Oct 20 18:39:44 2001
@@ -13,7 +13,20 @@
#include <linux/vmalloc.h>

#include <asm/bitops.h>
+#include <linux/rcupdate.h>

+struct rcu_fd_array {
+ struct rcu_head rh;
+ struct file **array;
+ int nfds;
+};
+
+struct rcu_fd_set {
+ struct rcu_head rh;
+ fd_set *openset;
+ fd_set *execset;
+ int nfds;
+};

/*
* Allocate an fd array, using kmalloc or vmalloc.
@@ -48,6 +61,13 @@
vfree(array);
}

+static void fd_array_callback(void *arg)
+{
+ struct rcu_fd_array *a = (struct rcu_fd_array *) arg;
+ free_fd_array(a->array, a->nfds);
+ kfree(arg);
+}
+
/*
* Expand the fd array in the files_struct. Called with the files
* spinlock held for write.
@@ -55,8 +75,9 @@

int expand_fd_array(struct files_struct *files, int nr)
{
- struct file **new_fds;
- int error, nfds;
+ struct file **new_fds = NULL;
+ int error, nfds = 0;
+ struct rcu_fd_array *arg;


error = -EMFILE;
@@ -64,7 +85,7 @@
goto out;

nfds = files->max_fds;
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);

/*
* Expand to the max in easy steps, and keep expanding it until
@@ -88,18 +109,17 @@

error = -ENOMEM;
new_fds = alloc_fd_array(nfds);
- write_lock(&files->file_lock);
- if (!new_fds)
+ arg = (struct rcu_fd_array *) kmalloc(sizeof(*arg), GFP_ATOMIC);
+
+ spin_lock(&files->file_lock);
+ if (!new_fds || !arg)
goto out;

/* Copy the existing array and install the new pointer */

if (nfds > files->max_fds) {
- struct file **old_fds;
- int i;
-
- old_fds = xchg(&files->fd, new_fds);
- i = xchg(&files->max_fds, nfds);
+ struct file **old_fds = files->fd;
+ int i = files->max_fds;

/* Don't copy/clear the array if we are creating a new
fd array for fork() */
@@ -108,19 +128,36 @@
/* clear the remainder of the array */
memset(&new_fds[i], 0,
(nfds-i) * sizeof(struct file *));
+ }

- write_unlock(&files->file_lock);
- free_fd_array(old_fds, i);
- write_lock(&files->file_lock);
+ /* mem barrier needed for Alpha*/
+ files->fd = new_fds;
+ /* mem barrier needed for Alpha*/
+ files->max_fds = nfds;
+
+ if (i) {
+ arg->array = old_fds;
+ arg->nfds = i;
+ call_rcu(&arg->rh, fd_array_callback, arg);
+ }
+ else {
+ kfree(arg);
}
} else {
/* Somebody expanded the array while we slept ... */
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
free_fd_array(new_fds, nfds);
- write_lock(&files->file_lock);
+ kfree(arg);
+ spin_lock(&files->file_lock);
}
- error = 0;
+
+ return 0;
out:
+ if (new_fds)
+ free_fd_array(new_fds, nfds);
+ if (arg)
+ kfree(arg);
+
return error;
}

@@ -157,6 +194,14 @@
vfree(array);
}

+static void fd_set_callback (void *arg)
+{
+ struct rcu_fd_set *a = (struct rcu_fd_set *) arg;
+ free_fdset(a->openset, a->nfds);
+ free_fdset(a->execset, a->nfds);
+ kfree(arg);
+}
+
/*
* Expand the fdset in the files_struct. Called with the files spinlock
* held for write.
@@ -165,13 +210,14 @@
{
fd_set *new_openset = 0, *new_execset = 0;
int error, nfds = 0;
+ struct rcu_fd_set *arg = NULL;

error = -EMFILE;
if (files->max_fdset >= NR_OPEN || nr >= NR_OPEN)
goto out;

nfds = files->max_fdset;
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);

/* Expand to the max in easy steps */
do {
@@ -187,46 +233,56 @@
error = -ENOMEM;
new_openset = alloc_fdset(nfds);
new_execset = alloc_fdset(nfds);
- write_lock(&files->file_lock);
- if (!new_openset || !new_execset)
+ arg = (struct rcu_fd_set *) kmalloc(sizeof(*arg), GFP_ATOMIC);
+ spin_lock(&files->file_lock);
+ if (!new_openset || !new_execset || !arg)
goto out;

error = 0;

/* Copy the existing tables and install the new pointers */
if (nfds > files->max_fdset) {
- int i = files->max_fdset / (sizeof(unsigned long) * 8);
- int count = (nfds - files->max_fdset) / 8;
+ fd_set * old_openset = files->open_fds;
+ fd_set * old_execset = files->close_on_exec;
+ int old_nfds = files->max_fdset;
+ int i = old_nfds / (sizeof(unsigned long) * 8);
+ int count = (nfds - old_nfds) / 8;

/*
* Don't copy the entire array if the current fdset is
* not yet initialised.
*/
if (i) {
- memcpy (new_openset, files->open_fds, files->max_fdset/8);
- memcpy (new_execset, files->close_on_exec, files->max_fdset/8);
+ memcpy (new_openset, old_openset, old_nfds/8);
+ memcpy (new_execset, old_execset, old_nfds/8);
memset (&new_openset->fds_bits[i], 0, count);
memset (&new_execset->fds_bits[i], 0, count);
}

- nfds = xchg(&files->max_fdset, nfds);
- new_openset = xchg(&files->open_fds, new_openset);
- new_execset = xchg(&files->close_on_exec, new_execset);
- write_unlock(&files->file_lock);
- free_fdset (new_openset, nfds);
- free_fdset (new_execset, nfds);
- write_lock(&files->file_lock);
+ /* mem barrier needed for Alpha*/
+ files->open_fds = new_openset;
+ files->close_on_exec = new_execset;
+ /* mem barrier needed for Alpha*/
+ files->max_fdset = nfds;
+
+ arg->openset = old_openset;
+ arg->execset = old_execset;
+ arg->nfds = nfds;
+ call_rcu(&arg->rh, fd_set_callback, arg);
+
return 0;
}
/* Somebody expanded the array while we slept ... */

out:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
if (new_openset)
free_fdset(new_openset, nfds);
if (new_execset)
free_fdset(new_execset, nfds);
- write_lock(&files->file_lock);
+ if (arg)
+ kfree(arg);
+ spin_lock(&files->file_lock);
return error;
}

diff -urN linux-2.4.13pre5/fs/file_table.c linux-2.4.13pre5-fs/fs/file_table.c
--- linux-2.4.13pre5/fs/file_table.c Tue Sep 18 01:46:30 2001
+++ linux-2.4.13pre5-fs/fs/file_table.c Sat Oct 20 18:39:44 2001
@@ -129,13 +129,22 @@
struct file * fget(unsigned int fd)
{
struct file * file;
- struct files_struct *files = current->files;

- read_lock(&files->file_lock);
file = fcheck(fd);
- if (file)
+ if (file) {
get_file(file);
- read_unlock(&files->file_lock);
+
+ /* before returning check again if someone (as of now sys_close)
+ * has nullified the fd_array entry, if yes then we might have
+ * failed fput call for him by doing get_file() so do the
+ * favour of doing fput for him.
+ */
+
+ if (!(fcheck(fd))) {
+ fput(file);
+ return NULL;
+ }
+ }
return file;
}

diff -urN linux-2.4.13pre5/fs/open.c linux-2.4.13pre5-fs/fs/open.c
--- linux-2.4.13pre5/fs/open.c Sat Oct 20 18:46:09 2001
+++ linux-2.4.13pre5-fs/fs/open.c Sat Oct 20 18:39:44 2001
@@ -719,7 +719,7 @@
int fd, error;

error = -EMFILE;
- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);

repeat:
fd = find_next_zero_bit(files->open_fds,
@@ -768,7 +768,7 @@
error = fd;

out:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
return error;
}

@@ -849,7 +849,7 @@
struct file * filp;
struct files_struct *files = current->files;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
if (fd >= files->max_fds)
goto out_unlock;
filp = files->fd[fd];
@@ -858,11 +858,11 @@
files->fd[fd] = NULL;
FD_CLR(fd, files->close_on_exec);
__put_unused_fd(files, fd);
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
return filp_close(filp, files);

out_unlock:
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
return -EBADF;
}

diff -urN linux-2.4.13pre5/fs/proc/base.c linux-2.4.13pre5-fs/fs/proc/base.c
--- linux-2.4.13pre5/fs/proc/base.c Thu Oct 11 12:12:47 2001
+++ linux-2.4.13pre5-fs/fs/proc/base.c Sat Oct 20 18:39:44 2001
@@ -754,12 +754,10 @@
task_unlock(task);
if (!files)
goto out_unlock;
- read_lock(&files->file_lock);
file = inode->u.proc_i.file = fcheck_files(files, fd);
if (!file)
goto out_unlock2;
get_file(file);
- read_unlock(&files->file_lock);
put_files_struct(files);
inode->i_op = &proc_pid_link_inode_operations;
inode->i_size = 64;
@@ -775,7 +773,6 @@

out_unlock2:
put_files_struct(files);
- read_unlock(&files->file_lock);
out_unlock:
iput(inode);
out:
diff -urN linux-2.4.13pre5/fs/select.c linux-2.4.13pre5-fs/fs/select.c
--- linux-2.4.13pre5/fs/select.c Tue Sep 11 01:34:33 2001
+++ linux-2.4.13pre5-fs/fs/select.c Sat Oct 20 18:39:44 2001
@@ -167,9 +167,7 @@
int retval, i, off;
long __timeout = *timeout;

- read_lock(&current->files->file_lock);
retval = max_select_fd(n, fds);
- read_unlock(&current->files->file_lock);

if (retval < 0)
return retval;
diff -urN linux-2.4.13pre5/include/linux/file.h linux-2.4.13pre5-fs/include/linux/file.h
--- linux-2.4.13pre5/include/linux/file.h Wed Aug 23 23:52:26 2000
+++ linux-2.4.13pre5-fs/include/linux/file.h Sat Oct 20 18:39:44 2001
@@ -12,21 +12,19 @@
{
struct files_struct *files = current->files;
int res;
- read_lock(&files->file_lock);
res = FD_ISSET(fd, files->close_on_exec);
- read_unlock(&files->file_lock);
return res;
}

static inline void set_close_on_exec(unsigned int fd, int flag)
{
struct files_struct *files = current->files;
- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
if (flag)
FD_SET(fd, files->close_on_exec);
else
FD_CLR(fd, files->close_on_exec);
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
}

static inline struct file * fcheck_files(struct files_struct *files, unsigned int fd)
@@ -66,9 +64,9 @@
{
struct files_struct *files = current->files;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
__put_unused_fd(files, fd);
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
}

/*
@@ -88,11 +86,11 @@
{
struct files_struct *files = current->files;

- write_lock(&files->file_lock);
+ spin_lock(&files->file_lock);
if (files->fd[fd])
BUG();
files->fd[fd] = file;
- write_unlock(&files->file_lock);
+ spin_unlock(&files->file_lock);
}

void put_files_struct(struct files_struct *fs);
diff -urN linux-2.4.13pre5/include/linux/sched.h linux-2.4.13pre5-fs/include/linux/sched.h
--- linux-2.4.13pre5/include/linux/sched.h Thu Oct 11 12:14:34 2001
+++ linux-2.4.13pre5-fs/include/linux/sched.h Sat Oct 20 18:39:44 2001
@@ -171,7 +171,7 @@
*/
struct files_struct {
atomic_t count;
- rwlock_t file_lock; /* Protects all the below members. Nests inside tsk->alloc_lock */
+ spinlock_t file_lock; /* Protects all the below members. Nests inside tsk->alloc_lock */
int max_fds;
int max_fdset;
int next_fd;
@@ -186,7 +186,7 @@
#define INIT_FILES \
{ \
count: ATOMIC_INIT(1), \
- file_lock: RW_LOCK_UNLOCKED, \
+ file_lock: SPIN_LOCK_UNLOCKED, \
max_fds: NR_OPEN_DEFAULT, \
max_fdset: __FD_SETSIZE, \
next_fd: 0, \
diff -urN linux-2.4.13pre5/kernel/fork.c linux-2.4.13pre5-fs/kernel/fork.c
--- linux-2.4.13pre5/kernel/fork.c Tue Sep 18 10:16:04 2001
+++ linux-2.4.13pre5-fs/kernel/fork.c Sat Oct 20 18:39:44 2001
@@ -440,7 +440,7 @@

atomic_set(&newf->count, 1);

- newf->file_lock = RW_LOCK_UNLOCKED;
+ newf->file_lock = SPIN_LOCK_UNLOCKED;
newf->next_fd = 0;
newf->max_fds = NR_OPEN_DEFAULT;
newf->max_fdset = __FD_SETSIZE;
@@ -453,13 +453,12 @@
size = oldf->max_fdset;
if (size > __FD_SETSIZE) {
newf->max_fdset = 0;
- write_lock(&newf->file_lock);
+ spin_lock(&newf->file_lock);
error = expand_fdset(newf, size-1);
- write_unlock(&newf->file_lock);
+ spin_unlock(&newf->file_lock);
if (error)
goto out_release;
}
- read_lock(&oldf->file_lock);

open_files = count_open_files(oldf, size);

@@ -470,15 +469,13 @@
*/
nfds = NR_OPEN_DEFAULT;
if (open_files > nfds) {
- read_unlock(&oldf->file_lock);
newf->max_fds = 0;
- write_lock(&newf->file_lock);
+ spin_lock(&newf->file_lock);
error = expand_fd_array(newf, open_files-1);
- write_unlock(&newf->file_lock);
+ spin_unlock(&newf->file_lock);
if (error)
goto out_release;
nfds = newf->max_fds;
- read_lock(&oldf->file_lock);
}

old_fds = oldf->fd;
@@ -493,7 +490,6 @@
get_file(f);
*new_fds++ = f;
}
- read_unlock(&oldf->file_lock);

/* compute the remainder to be cleared */
size = (newf->max_fds - open_files) * sizeof(struct file *);
diff -urN linux-2.4.13pre5/net/ipv4/netfilter/ipt_owner.c linux-2.4.13pre5-fs/net/ipv4/netfilter/ipt_owner.c
--- linux-2.4.13pre5/net/ipv4/netfilter/ipt_owner.c Mon Oct 1 00:56:08 2001
+++ linux-2.4.13pre5-fs/net/ipv4/netfilter/ipt_owner.c Sat Oct 20 18:39:44 2001
@@ -25,16 +25,13 @@
task_lock(p);
files = p->files;
if(files) {
- read_lock(&files->file_lock);
for (i=0; i < files->max_fds; i++) {
if (fcheck_files(files, i) == skb->sk->socket->file) {
- read_unlock(&files->file_lock);
task_unlock(p);
read_unlock(&tasklist_lock);
return 1;
}
}
- read_unlock(&files->file_lock);
}
task_unlock(p);
out:
@@ -58,14 +55,12 @@
task_lock(p);
files = p->files;
if (files) {
- read_lock(&files->file_lock);
for (i=0; i < files->max_fds; i++) {
if (fcheck_files(files, i) == file) {
found = 1;
break;
}
}
- read_unlock(&files->file_lock);
}
task_unlock(p);
if(found)

2001-10-21 19:18:06

by jogi

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

On Fri, Oct 19, 2001 at 06:19:14AM +0200, Andrea Arcangeli wrote:
> The vm part in particular is right now getting stressed on a 16G box kindly
> provided by osdlab.org and it didn't exibith any problem yet. This is a trace
> of the workload that is running on the machine overnight.

Hello Andrea,

I did some performance comparison on my machine (Athlon 1200, 256MB DDR,
IDE hdd). Basically all I did was time the complete build prozess of a
kernel:

#!/bin/bash

REV=`uname -r`-`date +%s`
LF="/usr/src/logfile-$REV"
PAR=100

tar xvIf /home/public/Linux/kernel/v2.4/linux-2.4.12.tar.bz2 >>$LF 2>&1
cd linux
bzip2 -cd /home/public/Linux/kernel/v2.4/patch-2.4.13-pre5.bz2 | patch -p1
>>$LF 2>&1
bzip2 -cd /home/public/Linux/kernel/v2.4/2.4.13pre5aa1.bz2 | patch -p1 >>$LF
2>&1
patch -p0 < ../Makefile.patch >>$LF 2>&1
cp ../config-2.4.13-pre5aa1 .config
(make oldconfig dep clean && make -j$PAR bzImage modules && cat /proc/loadavg &&
cat /proc/meminfo) >
>$LF 2>&1

I ran the above script five times in a row after a fresh reboot into single
user mode so that no other processes could interfere. Here are the results:

j25 j50 j75 j100
2.4.12-ac3: 4:52.16 5:21.15 6:22.85 10:26.70
2.4.12-ac3: 4:52.31 5:22.52 8:40.97 15:55.37
2.4.12-ac3: 4:51.51 5:18.04 6:08.80 7:09.85
2.4.12-ac3: 4:52.47 5:10.81 5:51.02 8:53.40
2.4.12-ac3: 4:53.24 5:06.96 6:36.49 7:53.92
2.4.13-pre3aa1: 4:52.08 5:57.43 9:43.07 *
2.4.13-pre3aa1: 4:54.22 5:04.17 10:30.56 *
2.4.13-pre3aa1: 4:53.95 5:21.08 11:07.44 *
2.4.13-pre3aa1: 4:55.26 5:30.01 9:53.16 *
2.4.13-pre3aa1: 4:54.39 6:00.32 10:15.06 *
2.4.13-pre5aa1: 4:54.61 5:10.38 5:19.68 5:40.37
2.4.13-pre5aa1: 4:56.39 5:12.39 5:31.16 5:59.54
2.4.13-pre5aa1: 4:55.12 5:11.00 5:13.37 5:43.99
2.4.13-pre5aa1: 4:56.24 5:10.85 5:16.17 5:50.78
2.4.13-pre5aa1: 4:57.05 5:10.97 5:26.41 6:06.42

* Kernel build did not complete because OOM killer was activated.

If further infos are interesting just send me an email.


Kind regards,

Jochen


--

Well, yeah ... I suppose there's no point in getting greedy, is there?

<< Calvin & Hobbes >>

2001-10-22 00:04:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

On Sun, Oct 21, 2001 at 09:17:26PM +0200, [email protected] wrote:
> 2.4.13-pre3aa1: 4:54.39 6:00.32 10:15.06 *
^^^^^^^^
> 2.4.13-pre5aa1: 4:54.61 5:10.38 5:19.68 5:40.37
^^^^^^^

this is interesting. I'm also wondering what you'd get if you used:

echo 8 > /proc/sys/vm/vm_scan_ratio
echo 1 > /proc/sys/vm/vm_mapped_ratio
echo 3 > /proc/sys/vm/vm_balance_ratio

(or also the other combination that I suggested in the other emails)

Anyways you can probably skip the above test and wait for a further
update that changes more than just the default sysctl values (also
notably it introduces the PG_launder logic originated from a discussion
with Marcelo and Linus, resemling somehow part of the PG_wait_for_IO
write throttling logic that I had in 2.4.12aa1 and 2.4.13pre3aa1, but I
doubt pre3aa1 was slower because of that, and in case next -aa will
slowdown again I'll later ask you to try with a one liner patch that
will disable the write throttling for writepage again [like pre5aa1 did]
just to make sure it's not the one that hurts :)

thanks to you too for the feedback!

Andrea

2001-10-22 10:14:23

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

On Sat, Oct 20, 2001 at 07:40:24PM +0530, Maneesh Soni wrote:
>
> In article <[email protected]> you wrote:
>
> > Only in 2.4.13pre3aa1: 00_files_struct_rcu-2.4.10-04-1
> > Only in 2.4.13pre5aa1: 00_files_struct_rcu-2.4.10-04-2
>
> Hello Andrea,
>
> Please apply the following update for the rcu fd patch.
> This has fixes for two more bugs pointed by Dipankar.
>
> 1. fs/file.c
> in expand_fd_array new_fds is not freed if allocation for arg fails.
>
> 2. fs/file.c
> kmalloc for arg instead of *arg in expand_fd_array and expand_fdset

thanks for the update!! Applied.

Andrea

2001-10-22 11:12:58

by jogi

[permalink] [raw]
Subject: Re: 2.4.13pre5aa1

On Sun, Oct 21, 2001 at 05:50:30PM -0200, Rik van Riel wrote:
> On 21 Oct 2001 [email protected] wrote:
>
> > 2.4.12-ac3: 4:52.16 5:21.15 6:22.85 10:26.70
>
> > If further infos are interesting just send me an email.
>
> It would be cool if you could test this with 2.4.12-ac3 and
> my -vmpatch and -freeswap patches against this kernel ;)
>
> Patches on http://www.surriel.com/patches/

As promised here are the results for 2.4.12-ac3 + vmpatch
and freeswap.

2.4.12-ac3: 8:20.46
2.4.12-ac3: 6:36.67
2.4.12-ac3: 6:37.61
2.4.12-ac3: 7:36.24
2.4.12-ac3: 7:21.24

These times are for -j100.

Kind regards,

Jogi

--

Well, yeah ... I suppose there's no point in getting greedy, is there?

<< Calvin & Hobbes >>