A recent regression (introduced after 2.6.21) was caught by the LTP test
fcntl11. It appears that F_GETLK is not properly checking for existing
F_RDLCK and allows taking out a write lock.
This can be demonstrated by either running fcntl11 from the LTP suite or
I have hacked up a much shorter version which demonstrates the issue and
am attaching it.
Using git bisect I came up with this commit as the one that introduced
the issue. I briefly tried to back this out from the current tree but
appears a lot has change since then so I will need to try that manually.
commit c2fa1b8a6c059dd08a802545fed3badc8df2adc1
Author: J. Bruce Fields <[email protected]>
Date: Tue Feb 20 16:10:11 2007 -0500
locks: create posix-to-flock helper functions
Factor out a bit of messy code by creating posix-to-flock counterparts
to the existing flock-to-posix helper functions.
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: "J. Bruce Fields" <[email protected]>
- Doug
On Thu, 2007-05-10 at 14:56 -0400, Doug Chapman wrote:
> A recent regression (introduced after 2.6.21) was caught by the LTP test
> fcntl11. It appears that F_GETLK is not properly checking for existing
> F_RDLCK and allows taking out a write lock.
>
> This can be demonstrated by either running fcntl11 from the LTP suite or
> I have hacked up a much shorter version which demonstrates the issue and
> am attaching it.
>
I should add that I have seen this on ia64 and x86_64. I do not
currently have any 32 bit systems to test on.
- Doug
On Thu, May 10, 2007 at 02:56:15PM -0400, Doug Chapman wrote:
> A recent regression (introduced after 2.6.21) was caught by the LTP test
> fcntl11. It appears that F_GETLK is not properly checking for existing
> F_RDLCK and allows taking out a write lock.
Ouch.
> This can be demonstrated by either running fcntl11 from the LTP suite or
> I have hacked up a much shorter version which demonstrates the issue and
> am attaching it.
>
> Using git bisect I came up with this commit as the one that introduced
> the issue.
Thanks for the report--investigating....
--b.
On Thu, May 10, 2007 at 03:30:50PM -0400, bfields wrote:
> On Thu, May 10, 2007 at 02:56:15PM -0400, Doug Chapman wrote:
> > A recent regression (introduced after 2.6.21) was caught by the LTP test
> > fcntl11. It appears that F_GETLK is not properly checking for existing
> > F_RDLCK and allows taking out a write lock.
>
> Ouch.
>
> > This can be demonstrated by either running fcntl11 from the LTP suite or
> > I have hacked up a much shorter version which demonstrates the issue and
> > am attaching it.
> >
> > Using git bisect I came up with this commit as the one that introduced
> > the issue.
>
> Thanks for the report--investigating....
Argh. Looks like a cut-n-paste error. Does this fix it?
--b.
diff --git a/fs/locks.c b/fs/locks.c
index 671a034..909f454 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
flock->l_whence = 0;
+ flock->l_type = fl->fl_type;
return 0;
}
On Thu, May 10, 2007 at 03:38:59PM -0400, bfields wrote:
> On Thu, May 10, 2007 at 03:30:50PM -0400, bfields wrote:
> > On Thu, May 10, 2007 at 02:56:15PM -0400, Doug Chapman wrote:
> > > A recent regression (introduced after 2.6.21) was caught by the LTP test
> > > fcntl11. It appears that F_GETLK is not properly checking for existing
> > > F_RDLCK and allows taking out a write lock.
Hm, actually, could you double-check the test results? Looking at your
test case, it appears that it fails when the lock returned from the
fcntl(.,F_GETLK,.) has an l_type != F_RDLCK. That doesn't necessarily
mean the F_GETLK is reporting no conflict. I believe the bug is
actually that it's reporting the wrong kind of conflict--so it's
returning l_type == F_WRLCK, not F_UNLCK.
Also, this affects only F_GETLK, not F_SETLK, so you're not actually
managing to acquire a conflicting lock, right?
--b.
On Thu, 2007-05-10 at 15:38 -0400, J. Bruce Fields wrote:
> On Thu, May 10, 2007 at 03:30:50PM -0400, bfields wrote:
> > On Thu, May 10, 2007 at 02:56:15PM -0400, Doug Chapman wrote:
> > > A recent regression (introduced after 2.6.21) was caught by the LTP test
> > > fcntl11. It appears that F_GETLK is not properly checking for existing
> > > F_RDLCK and allows taking out a write lock.
> >
> > Ouch.
> >
> > > This can be demonstrated by either running fcntl11 from the LTP suite or
> > > I have hacked up a much shorter version which demonstrates the issue and
> > > am attaching it.
> > >
> > > Using git bisect I came up with this commit as the one that introduced
> > > the issue.
> >
> > Thanks for the report--investigating....
>
> Argh. Looks like a cut-n-paste error. Does this fix it?
>
> --b.
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 671a034..909f454 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
> flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
> fl->fl_end - fl->fl_start + 1;
> flock->l_whence = 0;
> + flock->l_type = fl->fl_type;
> return 0;
> }
>
Bruce,
This doesn't fix the problem but it does look like it should be there.
I imagine this would have been the next bug we tripped over once the
original one is fixed.
- Doug
On Thu, 2007-05-10 at 16:23 -0400, J. Bruce Fields wrote:
> On Thu, May 10, 2007 at 03:38:59PM -0400, bfields wrote:
> > On Thu, May 10, 2007 at 03:30:50PM -0400, bfields wrote:
> > > On Thu, May 10, 2007 at 02:56:15PM -0400, Doug Chapman wrote:
> > > > A recent regression (introduced after 2.6.21) was caught by the LTP test
> > > > fcntl11. It appears that F_GETLK is not properly checking for existing
> > > > F_RDLCK and allows taking out a write lock.
>
> Hm, actually, could you double-check the test results? Looking at your
> test case, it appears that it fails when the lock returned from the
> fcntl(.,F_GETLK,.) has an l_type != F_RDLCK. That doesn't necessarily
> mean the F_GETLK is reporting no conflict. I believe the bug is
> actually that it's reporting the wrong kind of conflict--so it's
> returning l_type == F_WRLCK, not F_UNLCK.
You are partly right on the test however note that it is using a start
and len that are specific to the RDLCK so that should _only_ conflict
with that lock. I did notice that the LTP test is taking a new lock on
the entire file which should be blocked by eithe rthe RDLCK or the WRLCK
and it doesn't check both, I plan on fixing that once this is resolved.
But, much more importantly F_GETLK is returning F_UNLCK saying that
there was no conflict at all.
> Also, this affects only F_GETLK, not F_SETLK, so you're not actually
> managing to acquire a conflicting lock, right?
>
True, this doesn't actually acquire the lock. I have not tested to see
if trying a conflicting F_SETLK blocks as it should. I will test that
later. I missed lunch today so I won't get back to this until later
tonight or tomorrow....
- Doug
On Thu, May 10, 2007 at 05:01:05PM -0400, Doug Chapman wrote:
> You are partly right on the test however note that it is using a start
> and len that are specific to the RDLCK so that should _only_ conflict
> with that lock. I did notice that the LTP test is taking a new lock on
> the entire file which should be blocked by eithe rthe RDLCK or the WRLCK
> and it doesn't check both, I plan on fixing that once this is resolved.
>
> But, much more importantly F_GETLK is returning F_UNLCK saying that
> there was no conflict at all.
Argh, OK. I still can't see the problem yet, then. What filesystem is
this on?
--b.
On Thu, May 10, 2007 at 05:04:21PM -0400, bfields wrote:
> On Thu, May 10, 2007 at 05:01:05PM -0400, Doug Chapman wrote:
> > You are partly right on the test however note that it is using a start
> > and len that are specific to the RDLCK so that should _only_ conflict
> > with that lock. I did notice that the LTP test is taking a new lock on
> > the entire file which should be blocked by eithe rthe RDLCK or the WRLCK
> > and it doesn't check both, I plan on fixing that once this is resolved.
> >
> > But, much more importantly F_GETLK is returning F_UNLCK saying that
> > there was no conflict at all.
>
> Argh, OK. I still can't see the problem yet, then. What filesystem is
> this on?
Oh, cripes. I'm a loser. Next to figure out what's up with the
connectathon locking tests that they pass when GETLK never finds a
conflicting lock....
--b.
diff --git a/fs/locks.c b/fs/locks.c
index 53b0cd1..7fd2d17 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -670,7 +670,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
{
struct file_lock *cfl;
- fl->fl_type = F_UNLCK;
lock_kernel();
for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
if (!IS_POSIX(cfl))
@@ -682,7 +681,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
__locks_copy_lock(fl, cfl);
unlock_kernel();
return 1;
- }
+ } else
+ fl->fl_type = F_UNLCK;
unlock_kernel();
return 0;
}
In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock
to modify its single file_lock argument instead of taking separate input
and output arguments. This makes it no longer safe to set the output
lock's fl_type to F_UNLCK before looking for a conflict, since that
means searching for a conflict against a lock with type F_UNLCK.
This fixes a regression which causes F_GETLK to incorrectly report no
conflict on most filesystems (including any filesystem that doesn't do
its own locking).
Also fix posix_lock_to_flock() to copy the lock type. This isn't
strictly necessary, since the caller already does this; but it seems
less likely to cause confusion in the future.
Thanks to Doug Chapman for the bug report.
Signed-off-by: "J. Bruce Fields" <[email protected]>
---
fs/locks.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index 671a034..8ec16ab 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -669,7 +669,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
{
struct file_lock *cfl;
- fl->fl_type = F_UNLCK;
lock_kernel();
for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
if (!IS_POSIX(cfl))
@@ -681,7 +680,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
__locks_copy_lock(fl, cfl);
unlock_kernel();
return 1;
- }
+ } else
+ fl->fl_type = F_UNLCK;
unlock_kernel();
return 0;
}
@@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
fl->fl_end - fl->fl_start + 1;
flock->l_whence = 0;
+ flock->l_type = fl->fl_type;
return 0;
}
--
1.5.1.1.107.g7a159
On Thu, 2007-05-10 at 18:38 -0400, J. Bruce Fields wrote:
> In 9d6a8c5c213e34c475e72b245a8eb709258e968c we changed posix_test_lock
> to modify its single file_lock argument instead of taking separate input
> and output arguments. This makes it no longer safe to set the output
> lock's fl_type to F_UNLCK before looking for a conflict, since that
> means searching for a conflict against a lock with type F_UNLCK.
>
> This fixes a regression which causes F_GETLK to incorrectly report no
> conflict on most filesystems (including any filesystem that doesn't do
> its own locking).
>
> Also fix posix_lock_to_flock() to copy the lock type. This isn't
> strictly necessary, since the caller already does this; but it seems
> less likely to cause confusion in the future.
>
> Thanks to Doug Chapman for the bug report.
>
> Signed-off-by: "J. Bruce Fields" <[email protected]>
> ---
> fs/locks.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/locks.c b/fs/locks.c
> index 671a034..8ec16ab 100644
> --- a/fs/locks.c
> +++ b/fs/locks.c
> @@ -669,7 +669,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> {
> struct file_lock *cfl;
>
> - fl->fl_type = F_UNLCK;
> lock_kernel();
> for (cfl = filp->f_path.dentry->d_inode->i_flock; cfl; cfl = cfl->fl_next) {
> if (!IS_POSIX(cfl))
> @@ -681,7 +680,8 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
> __locks_copy_lock(fl, cfl);
> unlock_kernel();
> return 1;
> - }
> + } else
> + fl->fl_type = F_UNLCK;
> unlock_kernel();
> return 0;
> }
> @@ -1632,6 +1632,7 @@ static int posix_lock_to_flock(struct flock *flock, struct file_lock *fl)
> flock->l_len = fl->fl_end == OFFSET_MAX ? 0 :
> fl->fl_end - fl->fl_start + 1;
> flock->l_whence = 0;
> + flock->l_type = fl->fl_type;
> return 0;
> }
>
I tested this both with my little hacked up test program as well as with
the LTP tests. Looks good. Nice job on the quick turnaround on this
Bruce.
- Doug