Hi,
generic_osync_inode() (called by generic_file_write()) is not checking if
the inode being synced has the I_LOCK bit set before checking the I_DIRTY
bit.
AFAICS, the following problem can happen:
sync()
...
sync_one()
reset I_DIRTY, set I_LOCK
filemap_fdatasync() <-- #window
write_inode() <-- #window
filemap_fdatawait() <-- #window
unset I_LOCK
There is no guarantee that the inode is fully synced until sync_one()
cleans the inode I_LOCK bit.
If generic_osync_inode() checks the I_DIRTY bit (and sees it clean) during
"#window", an "O_SYNC write()" call may return to userspace without having
all the data actually synced.
If I'm not missing something here this patch should the problem.
Comments?
--- fs/inode.c~ Thu Mar 22 16:04:13 2001
+++ fs/inode.c Thu Apr 12 15:18:22 2001
@@ -347,6 +347,11 @@
#endif
spin_lock(&inode_lock);
+ while (inode->i_state & I_LOCK) {
+ spin_unlock(&inode_lock);
+ __wait_on_inode(inode);
+ spin_lock(&inode_lock);
+ }
if (!(inode->i_state & I_DIRTY))
goto out;
if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
>
> Comments?
>
> --- fs/inode.c~ Thu Mar 22 16:04:13 2001
> +++ fs/inode.c Thu Apr 12 15:18:22 2001
> @@ -347,6 +347,11 @@
> #endif
>
> spin_lock(&inode_lock);
> + while (inode->i_state & I_LOCK) {
> + spin_unlock(&inode_lock);
> + __wait_on_inode(inode);
> + spin_lock(&inode_lock);
> + }
> if (!(inode->i_state & I_DIRTY))
> goto out;
> if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
Ehh.
Why not just lock the inode around the thing?
The above looks rather ugly.
Linus
On Thu, 12 Apr 2001, Linus Torvalds wrote:
> On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
> >
> > Comments?
> >
> > --- fs/inode.c~ Thu Mar 22 16:04:13 2001
> > +++ fs/inode.c Thu Apr 12 15:18:22 2001
> > @@ -347,6 +347,11 @@
> > #endif
> >
> > spin_lock(&inode_lock);
> > + while (inode->i_state & I_LOCK) {
> > + spin_unlock(&inode_lock);
> > + __wait_on_inode(inode);
> > + spin_lock(&inode_lock);
> > + }
> > if (!(inode->i_state & I_DIRTY))
> > goto out;
> > if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
>
> Ehh.
>
> Why not just lock the inode around the thing?
>
> The above looks rather ugly.
You mean writing a function called "lock_inode()" or whatever to basically
do what I did ?
On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
>
> On Thu, 12 Apr 2001, Linus Torvalds wrote:
>
> > On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
> > >
> > > Comments?
> > >
> > > --- fs/inode.c~ Thu Mar 22 16:04:13 2001
> > > +++ fs/inode.c Thu Apr 12 15:18:22 2001
> > > @@ -347,6 +347,11 @@
> > > #endif
> > >
> > > spin_lock(&inode_lock);
> > > + while (inode->i_state & I_LOCK) {
> > > + spin_unlock(&inode_lock);
> > > + __wait_on_inode(inode);
> > > + spin_lock(&inode_lock);
> > > + }
> > > if (!(inode->i_state & I_DIRTY))
> > > goto out;
> > > if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
> >
> > Ehh.
> >
> > Why not just lock the inode around the thing?
> >
> > The above looks rather ugly.
>
> You mean writing a function called "lock_inode()" or whatever to basically
> do what I did ?
Oh well, its still bad.
We drop the inode_lock before calling write_inode_now() (which is broken,
too :)), which means someone can set I_LOCK under us.
I'll send you a patch to fix that one (and other callers of
write_inode_now()) later.
On Thu, 12 Apr 2001, Linus Torvalds wrote:
>
>
> On Thu, 12 Apr 2001, Marcelo Tosatti wrote:
> >
> > Comments?
> >
> > --- fs/inode.c~ Thu Mar 22 16:04:13 2001
> > +++ fs/inode.c Thu Apr 12 15:18:22 2001
> > @@ -347,6 +347,11 @@
> > #endif
> >
> > spin_lock(&inode_lock);
> > + while (inode->i_state & I_LOCK) {
> > + spin_unlock(&inode_lock);
> > + __wait_on_inode(inode);
> > + spin_lock(&inode_lock);
> > + }
> > if (!(inode->i_state & I_DIRTY))
> > goto out;
> > if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
>
> Ehh.
>
> Why not just lock the inode around the thing?
>
> The above looks rather ugly.
Ok, me again.
The inode->i_state locking is rather nasty: there is no need to lock the
inode. We just have to wait for it to become unlocked, since its
guaranteed that who locked it wrote it to disk. (sync_one())
Aviro suggested the following, which is much cleaner than the previous
patch:
--- fs/inode.c~ Thu Apr 12 21:15:23 2001
+++ fs/inode.c Thu Apr 12 21:16:35 2001
@@ -301,6 +301,8 @@
while (inode->i_state & I_DIRTY)
sync_one(inode, sync);
spin_unlock(&inode_lock);
+ if (sync)
+ wait_on_inode(inode);
}
else
printk("write_inode_now: no super block\n");
@@ -357,6 +359,7 @@
out:
spin_unlock(&inode_lock);
+ wait_on_inode(inode);
return err;
}