2000-12-05 18:26:42

by Tigran Aivazian

[permalink] [raw]
Subject: [patch-2.4.0-test12-pre5] optimized get_empty_filp()

Hi Linus,

It is quite clear that dropping and retaking the files_lock in
fs/file_table.c:get_empty_filp() at the label new_one is absolutely
unnecessary. The only reason one could think of was to "hold the lock for
as short time as possible" but a minute's thought reveals that such reason
is invalid (i.e. one is more likely to waste time spinning on the lock
than to save it by dropping/retaking it, given the relative duration of
the instructions we execute there without the lock).

So, the patch (tested under 2.4.0-test12-pre5) is below.

Regards,
Tigran

--- linux/fs/file_table.c Fri Nov 17 19:36:27 2000
+++ work/fs/file_table.c Tue Dec 5 16:52:06 2000
@@ -40,13 +40,11 @@
list_del(&f->f_list);
files_stat.nr_free_files--;
new_one:
- file_list_unlock();
memset(f, 0, sizeof(*f));
atomic_set(&f->f_count,1);
f->f_version = ++event;
f->f_uid = current->fsuid;
f->f_gid = current->fsgid;
- file_list_lock();
list_add(&f->f_list, &anon_list);
file_list_unlock();
return f;


2000-12-05 22:31:37

by Peter Samuelson

[permalink] [raw]
Subject: Re: [patch-2.4.0-test12-pre5] optimized get_empty_filp()


[Tigran Aivazian]
> The only reason one could think of was to "hold the lock for as short
> time as possible" but a minute's thought reveals that such reason is
> invalid (i.e. one is more likely to waste time spinning on the lock
> than to save it by dropping/retaking it, given the relative duration
> of the instructions we execute there without the lock).

If there is no contention, you do not spin, no time wasted

If there *is* contention, you deserialize the routine just a little
bit, which is generally a Good Thing.

Whether a memset of 92 bytes (on 32-bit arch), plus an atomic_set(),
are worth deserializing, I do not know.

Peter

2000-12-05 23:02:31

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [patch-2.4.0-test12-pre5] optimized get_empty_filp()

On Tue, 5 Dec 2000, Peter Samuelson wrote:
> [Tigran Aivazian]
> > The only reason one could think of was to "hold the lock for as short
> > time as possible" but a minute's thought reveals that such reason is
> > invalid (i.e. one is more likely to waste time spinning on the lock
> > than to save it by dropping/retaking it, given the relative duration
> > of the instructions we execute there without the lock).
>
> If there is no contention, you do not spin, no time wasted

wrong -- time is wasted -- 1 decb for lock and 1 movb for unlock. No time
is wasted on spinning but 2 instructions wasted for taking/dropping the
lock.

>
> If there *is* contention, you deserialize the routine just a little
> bit, which is generally a Good Thing.
>
> Whether a memset of 92 bytes (on 32-bit arch), plus an atomic_set(),
> are worth deserializing, I do not know.
>

Of course, they are worth it. Actually, I don't understand how can you
even doubt it? Even a single cycle of code executed for _no reason_ must
be removed, if for no other reason than to make the code easier to
understand and prevent people from asking questions like "why does X do Y
for no reason?" -- if there are no such items "Y" then the questions will
also cease.

Regards,
Tigran

2000-12-05 23:21:51

by Peter Samuelson

[permalink] [raw]
Subject: Re: [patch-2.4.0-test12-pre5] optimized get_empty_filp()


[Peter Samuelson]
> > Whether a memset of 92 bytes (on 32-bit arch), plus an
> > atomic_set(), are worth deserializing, I do not know.

[Tigran Aivazian]
> Of course, they are worth it. Actually, I don't understand how can
> you even doubt it?

Clearly we are talking at cross-purposes here. I do realize that
spin_lock+spin_unlock have a non-zero cost.

The question is whether or not it is worth taking a lock again (with
that non-zero cost) to achieve the gain of doing the 92-byte memset and
the atomic_set in parallel with other CPUs. In other words, by locking
and unlocking twice, you reduce the contention on the lock. Is this,
however, worth the extra cycles and bus traffic? I don't know.

> Even a single cycle of code executed for _no reason_ must be removed,
> if for no other reason than to make the code easier to understand and
> prevent people from asking questions like "why does X do Y for no
> reason?"

Taken to its logical conclusion, you are arguing for Linux 2.0.x SMP.
Nobody takes any locks at all, except the BKL at kernel entry points.
Clearly that is suboptimal for contention.

Peter

2000-12-06 07:32:03

by Tigran Aivazian

[permalink] [raw]
Subject: Re: [patch-2.4.0-test12-pre5] optimized get_empty_filp()

On Tue, 5 Dec 2000, Peter Samuelson wrote:
> The question is whether or not it is worth taking a lock again (with
> that non-zero cost) to achieve the gain of doing the 92-byte memset and
> the atomic_set in parallel with other CPUs. In other words, by locking
> and unlocking twice, you reduce the contention on the lock. Is this,
> however, worth the extra cycles and bus traffic? I don't know.

Ok, that's a different (opposite) question and so if the answer previously
was "Yes" then obviously the answer to this one is "No". I.e. it is not
worth retaking the lock for the sake of reducing contention on the
lock. And it is not just a couple of extra instructions but it is also a
bus lock (or cache lock if you are lucky, on P6 only). And that is exactly
what my patch does -- removes the unnecessary bus locking and at least 2
instructions.

Regards,
Tigran

PS. Your previous statement (when clarified and explained by you in this
message) made me think that you misread the patch, i.e. saw "+" when there
was "-" -- I removed those lines and not added them... Here is the patch
again for you to see it clearer:

--- linux/fs/file_table.c Fri Nov 17 19:36:27 2000
+++ work/fs/file_table.c Tue Dec 5 16:52:06 2000
@@ -40,13 +40,11 @@
list_del(&f->f_list);
files_stat.nr_free_files--;
new_one:
- file_list_unlock();
memset(f, 0, sizeof(*f));
atomic_set(&f->f_count,1);
f->f_version = ++event;
f->f_uid = current->fsuid;
f->f_gid = current->fsgid;
- file_list_lock();
list_add(&f->f_list, &anon_list);
file_list_unlock();
return f;