2008-01-09 02:45:42

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH 2/2] Fix lockd panic

Neil Brown wrote:

>
>If I'm reading this correctly, this bug is introduced by your previous
>patch.
>
>
Depending on how you see the issue. From my end, I view this as the
existing code has a "trap" and I fell into it. This is probably a chance
to clean up this logic.

>The important difference between the old code and the new code here is
>that the old code tests "file->f_locks" while the new code iterates
>through i_flock to see if there are any lockd locks.
>
>f_locks is set to a count of lockd locks in nlm_traverse_locks which
>*was* always called by nlm_inspect_file which is called immediately
>before the code you are changing.
>But since your patch, nlm_inspect_file does not always call
>nlm_traverse_locks, so there is a chance that f_locks is wrong.
>
>With this patch, f_locks it not used at all any more.
>
>
Yes, a fair description of the issue !

>Introducing a bug in one patch and fixing in the next is bad style.
>
>
ok .....

>Some options:
>
> Have an initial patch which removes all references to f_locks and
> includes the change in this patch. With that in place your main
> patch won't introduce a bug. If you do this, you should attempt to
> understand and justify the performance impact (does nlm_traverse_files
> become quadratic in number of locks. Is that acceptable?).
>
> Change the first patch to explicitly update f_count if you bypass the
> call to nlm_inspect_file.
>
> something else???
>
>

Let's see what hch says in another email... will come back to this soon.

>So NAK for this one in it's current form... unless I've misunderstood
>something.
>
>
>
I expect this :)

-- Wendy



2008-01-09 04:26:16

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH 2/2] Fix lockd panic

Wendy Cheng wrote:

> Neil Brown wrote:
>
>> Some options:
>>
>> Have an initial patch which removes all references to f_locks and
>> includes the change in this patch. With that in place your main
>> patch won't introduce a bug. If you do this, you should attempt to
>> understand and justify the performance impact (does nlm_traverse_files
>> become quadratic in number of locks. Is that acceptable?).
>>
>> Change the first patch to explicitly update f_count if you bypass the
>> call to nlm_inspect_file.
>>
>> something else???
>>
>>
>
> Let's see what hch says in another email... will come back to this soon.
>

Will do a quick and dirty performance measure tomorrow when I get to the
office. Then we'll go from there.

-- Wendy

2008-01-09 23:33:25

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH 2/2] Fix lockd panic

Wendy Cheng wrote:
> Wendy Cheng wrote:
>
>> Neil Brown wrote:
>>
>>> Some options:
>>>
>>> Have an initial patch which removes all references to f_locks and
>>> includes the change in this patch. With that in place your main
>>> patch won't introduce a bug. If you do this, you should attempt to
>>> understand and justify the performance impact (does nlm_traverse_files
>>> become quadratic in number of locks. Is that acceptable?).
>>>
>>> Change the first patch to explicitly update f_count if you bypass the
>>> call to nlm_inspect_file.
>>>
>>> something else???
>>>
>>>
>>
>> Let's see what hch says in another email... will come back to this soon.
>>
>
> Will do a quick and dirty performance measure tomorrow when I get to
> the office. Then we'll go from there.

The test didn't go well and I'm still debugging .. However, let's
revisit the issue:

[quot from Neil's comment]
The important difference between the old code and the new code here is
that the old code tests "file->f_locks" while the new code iterates
through i_flock to see if there are any lockd locks.

f_locks is set to a count of lockd locks in nlm_traverse_locks which
*was* always called by nlm_inspect_file which is called immediately
before the code you are changing.
But since your patch, nlm_inspect_file does not always call
nlm_traverse_locks, so there is a chance that f_locks is wrong.

With this patch, f_locks it not used at all any more.
[end quot]

The point here is "with this patch, f_locks it not used at all any
more." Note that we have a nice inline function "nlm_file_inuse", why
should we use f_locks (that I assume people agree that it is awkward) ?
Could we simply drop f_locks all together in this section of code?

-- Wendy

2008-01-12 06:34:20

by Wendy Cheng

[permalink] [raw]
Subject: Re: [PATCH 2/2] Fix lockd panic

Wendy Cheng wrote:

> The point here is "with this patch, f_locks it not used at all any
> more." Note that we have a nice inline function "nlm_file_inuse", why
> should we use f_locks (that I assume people agree that it is awkward)
> ? Could we simply drop f_locks all together in this section of code?
>
Start to have a second thought about this ....

Removing f_locks does make the code much more readable. However, if
inode->i_flock list is long, e.g. large amount of (clients) hosts and/or
processes from other hosts competing for the same lock, we don't want to
do the list walk twice within nlm_traverse_files(). Intuitively this is
unlikely but I prefer not changing the current behavior. As the result,
I'm withdrawing the patch all together.

The first patch will handle the issue correctly. Will submit it after
this post.

-- Wendy