Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.
This reduces list search and lock hold time aswell as getting rid of
one of the few uses of file_list_lock which Ingo identified as a
scalability problem.
Previously we called dq_op->initialize for every inode handing of
a writeable file that wasn't initialized before. Now we're calling
it for every inode that has a non-zero i_writecount, aka a writeable
file descriptor refering to it.
Thanks a lot to Jan Kara for running this patch through his quota
test harness.
Note: this is ontop of '[PATCH] move remove_dquot_ref to dqout.c'
which I sent out yesterday.
Signed-off-by: Christoph Hellwig <[email protected]>
Index: linux-2.6/fs/dquot.c
===================================================================
--- linux-2.6.orig/fs/dquot.c 2007-02-05 18:54:36.000000000 +0100
+++ linux-2.6/fs/dquot.c 2007-02-05 18:59:48.000000000 +0100
@@ -689,23 +689,27 @@
/* This routine is guarded by dqonoff_mutex mutex */
static void add_dquot_ref(struct super_block *sb, int type)
{
- struct list_head *p;
+ struct inode *inode;
restart:
- file_list_lock();
- list_for_each(p, &sb->s_files) {
- struct file *filp = list_entry(p, struct file, f_u.fu_list);
- struct inode *inode = filp->f_path.dentry->d_inode;
- if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
- struct dentry *dentry = dget(filp->f_path.dentry);
- file_list_unlock();
- sb->dq_op->initialize(inode, type);
- dput(dentry);
- /* As we may have blocked we had better restart... */
- goto restart;
- }
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
+ if (!atomic_read(&inode->i_writecount))
+ continue;
+ if (!dqinit_needed(inode, type))
+ continue;
+ if (inode->i_state & (I_FREEING|I_WILL_FREE))
+ continue;
+
+ __iget(inode);
+ spin_unlock(&inode_lock);
+
+ sb->dq_op->initialize(inode, type);
+ iput(inode);
+ /* As we may have blocked we had better restart... */
+ goto restart;
}
- file_list_unlock();
+ spin_unlock(&inode_lock);
}
/* Return 0 if dqput() won't block (note that 1 doesn't necessarily mean blocking) */
On Tue 06-02-07 14:23:33, Christoph Hellwig wrote:
> Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.
> This reduces list search and lock hold time aswell as getting rid of
> one of the few uses of file_list_lock which Ingo identified as a
> scalability problem.
>
> Previously we called dq_op->initialize for every inode handing of
> a writeable file that wasn't initialized before. Now we're calling
> it for every inode that has a non-zero i_writecount, aka a writeable
> file descriptor refering to it.
>
> Thanks a lot to Jan Kara for running this patch through his quota
> test harness.
>
> Note: this is ontop of '[PATCH] move remove_dquot_ref to dqout.c'
> which I sent out yesterday.
>
>
> Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
> Index: linux-2.6/fs/dquot.c
> ===================================================================
> --- linux-2.6.orig/fs/dquot.c 2007-02-05 18:54:36.000000000 +0100
> +++ linux-2.6/fs/dquot.c 2007-02-05 18:59:48.000000000 +0100
> @@ -689,23 +689,27 @@
> /* This routine is guarded by dqonoff_mutex mutex */
> static void add_dquot_ref(struct super_block *sb, int type)
> {
> - struct list_head *p;
> + struct inode *inode;
>
> restart:
> - file_list_lock();
> - list_for_each(p, &sb->s_files) {
> - struct file *filp = list_entry(p, struct file, f_u.fu_list);
> - struct inode *inode = filp->f_path.dentry->d_inode;
> - if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
> - struct dentry *dentry = dget(filp->f_path.dentry);
> - file_list_unlock();
> - sb->dq_op->initialize(inode, type);
> - dput(dentry);
> - /* As we may have blocked we had better restart... */
> - goto restart;
> - }
> + spin_lock(&inode_lock);
> + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> + if (!atomic_read(&inode->i_writecount))
> + continue;
> + if (!dqinit_needed(inode, type))
> + continue;
> + if (inode->i_state & (I_FREEING|I_WILL_FREE))
> + continue;
> +
> + __iget(inode);
> + spin_unlock(&inode_lock);
> +
> + sb->dq_op->initialize(inode, type);
> + iput(inode);
> + /* As we may have blocked we had better restart... */
> + goto restart;
> }
> - file_list_unlock();
> + spin_unlock(&inode_lock);
> }
>
> /* Return 0 if dqput() won't block (note that 1 doesn't necessarily mean blocking) */
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs
On Tue, 6 Feb 2007 14:23:33 +0100
Christoph Hellwig <[email protected]> wrote:
> static void add_dquot_ref(struct super_block *sb, int type)
> {
> - struct list_head *p;
> + struct inode *inode;
>
> restart:
> - file_list_lock();
> - list_for_each(p, &sb->s_files) {
> - struct file *filp = list_entry(p, struct file, f_u.fu_list);
> - struct inode *inode = filp->f_path.dentry->d_inode;
> - if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
> - struct dentry *dentry = dget(filp->f_path.dentry);
> - file_list_unlock();
> - sb->dq_op->initialize(inode, type);
> - dput(dentry);
> - /* As we may have blocked we had better restart... */
> - goto restart;
> - }
> + spin_lock(&inode_lock);
> + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> + if (!atomic_read(&inode->i_writecount))
> + continue;
> + if (!dqinit_needed(inode, type))
> + continue;
> + if (inode->i_state & (I_FREEING|I_WILL_FREE))
> + continue;
> +
> + __iget(inode);
> + spin_unlock(&inode_lock);
> +
> + sb->dq_op->initialize(inode, type);
> + iput(inode);
> + /* As we may have blocked we had better restart... */
> + goto restart;
> }
> - file_list_unlock();
> + spin_unlock(&inode_lock);
> }
That loop has (and had) up to O(n^n) operations. Is there something which
prevents this from going insane?
On Tue, Feb 06, 2007 at 03:50:01PM -0800, Andrew Morton wrote:
> On Tue, 6 Feb 2007 14:23:33 +0100
> Christoph Hellwig <[email protected]> wrote:
>
> > static void add_dquot_ref(struct super_block *sb, int type)
> > {
> > - struct list_head *p;
> > + struct inode *inode;
> >
> > restart:
> > - file_list_lock();
> > - list_for_each(p, &sb->s_files) {
> > - struct file *filp = list_entry(p, struct file, f_u.fu_list);
> > - struct inode *inode = filp->f_path.dentry->d_inode;
> > - if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
> > - struct dentry *dentry = dget(filp->f_path.dentry);
> > - file_list_unlock();
> > - sb->dq_op->initialize(inode, type);
> > - dput(dentry);
> > - /* As we may have blocked we had better restart... */
> > - goto restart;
> > - }
> > + spin_lock(&inode_lock);
> > + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > + if (!atomic_read(&inode->i_writecount))
> > + continue;
> > + if (!dqinit_needed(inode, type))
> > + continue;
> > + if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > + continue;
> > +
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > +
> > + sb->dq_op->initialize(inode, type);
> > + iput(inode);
> > + /* As we may have blocked we had better restart... */
> > + goto restart;
> > }
> > - file_list_unlock();
> > + spin_unlock(&inode_lock);
> > }
>
> That loop has (and had) up to O(n^n) operations. Is there something which
> prevents this from going insane?
I don't think so. Then again it's only called when you call quotaon on
a mounted filesystem, and normally you don't have that many inodes
instanciated at that time.
On Tue 06-02-07 15:50:01, Andrew Morton wrote:
> On Tue, 6 Feb 2007 14:23:33 +0100
> Christoph Hellwig <[email protected]> wrote:
>
> > static void add_dquot_ref(struct super_block *sb, int type)
> > {
> > - struct list_head *p;
> > + struct inode *inode;
> >
> > restart:
> > - file_list_lock();
> > - list_for_each(p, &sb->s_files) {
> > - struct file *filp = list_entry(p, struct file, f_u.fu_list);
> > - struct inode *inode = filp->f_path.dentry->d_inode;
> > - if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
> > - struct dentry *dentry = dget(filp->f_path.dentry);
> > - file_list_unlock();
> > - sb->dq_op->initialize(inode, type);
> > - dput(dentry);
> > - /* As we may have blocked we had better restart... */
> > - goto restart;
> > - }
> > + spin_lock(&inode_lock);
> > + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
> > + if (!atomic_read(&inode->i_writecount))
> > + continue;
> > + if (!dqinit_needed(inode, type))
> > + continue;
> > + if (inode->i_state & (I_FREEING|I_WILL_FREE))
> > + continue;
> > +
> > + __iget(inode);
> > + spin_unlock(&inode_lock);
> > +
> > + sb->dq_op->initialize(inode, type);
> > + iput(inode);
> > + /* As we may have blocked we had better restart... */
> > + goto restart;
> > }
> > - file_list_unlock();
> > + spin_unlock(&inode_lock);
> > }
>
> That loop has (and had) up to O(n^n) operations. Is there something which
> prevents this from going insane?
Huh, I guess you meant O(n^2)... Yes, I know about this but as Christoph
pointed out, the loop runs only when quotaon is called on a filesystem and for
inodes somebody writes to. Usually, there are not so many such inodes. So
I'm not sure it's worth the trouble to fix this.
Honza
--
Jan Kara <[email protected]>
SuSE CR Labs
On Feb 7 2007 08:22, Christoph Hellwig wrote:
>> That loop has (and had) up to O(n^n) operations. Is there something which
>> prevents this from going insane?
>
>I don't think so. Then again it's only called when you call quotaon on
>a mounted filesystem, and normally you don't have that many inodes
>instanciated at that time.
With filesystems that can turn on their quota after mount time (about
every fs except xfs), I can surely have a ton of files open, and hence,
if I understand correctly, have lots of inodes instantiated.
Jan
--
On Wed, Feb 07, 2007 at 07:03:05PM +0100, Jan Engelhardt wrote:
> With filesystems that can turn on their quota after mount time (about
> every fs except xfs), I can surely have a ton of files open, and hence,
> if I understand correctly, have lots of inodes instantiated.
Yes, you can in theory. But turning on quota on a filesystem in full
steam useage is not a common use case and thus there is no point in
optimizing for it.
On Tue, Feb 06, 2007 at 02:23:33PM +0100, Christoph Hellwig wrote:
> Iterate over sb->s_inodes instead of sb->s_files in add_dquot_ref.
> This reduces list search and lock hold time aswell as getting rid of
> one of the few uses of file_list_lock which Ingo identified as a
> scalability problem.
Christoph,
The i_mutex lock the inode structure is also a source of contention
heavy when running a lot of parallel "find"s. I'm sure that folks
would be open to hearing suggestions regarding how to fix that.
bill
On Feb 7 2007 19:06, Christoph Hellwig wrote:
>On Wed, Feb 07, 2007 at 07:03:05PM +0100, Jan Engelhardt wrote:
>
>> With filesystems that can turn on their quota after mount time (about
>> every fs except xfs), I can surely have a ton of files open, and hence,
>> if I understand correctly, have lots of inodes instantiated.
>
>Yes, you can in theory. But turning on quota on a filesystem in full
>steam useage is not a common use case and thus there is no point in
>optimizing for it.
I put it to a test in a default scenario (quotaon sometime at boot).
SUSE Linux 10.1 with reiserfs(usrquota,grpquota).
This is the result:
add_dquot_ref: 30 files in sb->s_files for hda2
add_dquot_ref: Restarting after 12 files
add_dquot_ref: 30 files in sb->s_files for hda2
add_dquot_ref: Restarted 1 times
add_dquot_ref: 30 files in sb->s_files for hda2
add_dquot_ref: Restarting after 12 files
add_dquot_ref: 30 files in sb->s_files for hda2
add_dquot_ref: Restarted 1 times
Surprisingly few. I had expected to see more instantiated (but not necessarily
open) files.
Patch for reference:
Index: 18/fs/dquot.c
===================================================================
--- 18.orig/fs/dquot.c
+++ 18/fs/dquot.c
@@ -689,22 +689,35 @@ static int dqinit_needed(struct inode *i
static void add_dquot_ref(struct super_block *sb, int type)
{
struct list_head *p;
+ int restart = 0;
+ int num_files;
restart:
file_list_lock();
+ num_files = 0;
+ list_for_each(p, &sb->s_files) {
+ ++num_files;
+ }
+ printk(KERN_WARNING "add_dquot_ref: %d files in sb->s_files for %s\n",
+ num_files, sb->s_id);
+ num_files = 0;
list_for_each(p, &sb->s_files) {
struct file *filp = list_entry(p, struct file, f_u.fu_list);
struct inode *inode = filp->f_dentry->d_inode;
+ ++num_files;
if (filp->f_mode & FMODE_WRITE && dqinit_needed(inode, type)) {
struct dentry *dentry = dget(filp->f_dentry);
file_list_unlock();
sb->dq_op->initialize(inode, type);
dput(dentry);
/* As we may have blocked we had better restart... */
+ printk(KERN_WARNING "add_dquot_ref: Restarting after %d files\n", num_files);
+ ++restart;
goto restart;
}
}
file_list_unlock();
+ printk(KERN_WARNING "add_dquot_ref: Restarted %d times\n", restart);
}
/* Return 0 if dqput() won't block (note that 1 doesn't necessarily mean blocking) */
BTW, whilst looking for the function to return a readable name, I came
across reiserfs_bdevname() and bdevname(). The former uses sb->s_id, the
latter struct gendisk->hd_name+partition number.
Could someone elaborate on why there are two ways?
Jan
--
ft: http://freshmeat.net/p/chaostables/
On Thu, Feb 08, 2007 at 01:01:21AM -0800, Bill Huey wrote:
> Christoph,
>
> The i_mutex lock the inode structure is also a source of contention
> heavy when running a lot of parallel "find"s. I'm sure that folks
> would be open to hearing suggestions regarding how to fix that.
Christoph,
And while you're at it, you should also know that dcache_lock is next
in line to be nixed out of existence if possible.
i_mutex is a bitch and I'm not even going to think about how to get
rid of it since it's so widely used in many places (file systems aren't
my think as well). Maybe some more precise tracking of contention paths
would be useful to see if there's a pathological case creating a
cascade of contention events so that can be nixed, don't know.
About 1/10th of the lock stat events I've logged report that the owner
of the rtmutex is the "current" on a runqueue some where. An adaptive
lock would help with those contention events (spin for it while owner
is running for the mutex release) but really, the contention should be
avoided in the first place since they have a kind of (I think) polynomial
increase in contention time as you add more processors to the mix.
I have an adaptive lock implementation in my tree that eliminates
the contention between what looks like the IDE layer, work queues and
the anticipatory scheduler, but that's not a real problem unlike what
I've mentioned above. I can get you and others more specifics on the
problem if folks working on the lower layers want it.
Other than that the -rt patch does quite well with instrumenting all
sort of kernel behaviors that include contention and latency issues.
So I don't need to tell you how valuable the -rt patch is for these
issues since it's obvious, and I'm sure that you'll agree, that it's
been instrumental at discovering many problems with the stock kernel.
bill
On Thu, Feb 08, 2007 at 11:14:04PM -0800, Bill Huey wrote:
> I have an adaptive lock implementation in my tree that eliminates
> the contention between what looks like the IDE layer, work queues and
> the anticipatory scheduler, but that's not a real problem unlike what
> I've mentioned above. I can get you and others more specifics on the
> problem if folks working on the lower layers want it.
Correction, it eliminates all of the "live" contentions where the
mutex owner isn't asleep already.
bill