2009-02-11 11:23:19

by Frank van Maarseveen

[permalink] [raw]
Subject: [NLM] 2.6.27.14 breakage when grace period expires

I'm sorry to inform you but... it seems that there is a similar problem
in the NLM subsystem as reported previously but this time it is triggered
when the grace time expires after a reboot.

Client and server run 2.6.27.14 + previous fix, NFSv3.

On the client there are three shells running:

while :; do lck -w /mnt/foo 2; done

The "lck" program is the same as posted before and it obtains an exclusive
write lock then waits 2 seconds in above invocation (there's probably an
"fcntl" command equivalent). After an orderly server reboot + grace time
expiration one of above command loops reports:

lck: fcntl: No locks available

and all three get stuck. After ^C-ing all "lck" loops the server still
shows an entry in /proc/locks which causes the file to be locked
indefinately. Maybe two loops are sufficient to reproduce the issue or
maybe you need more, I don't know.

Interestingly, during the grace time at least one of the "lck" processes
should have re-obtained the lock but it didn't show up in /proc/locks
on the server.

Interestingly (#2), after removing the file on the server (i.e. no
sillyrename) the now free inode is still locked according to /proc/locks.
Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo
3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter
grace).

--
Frank


2009-02-11 20:35:52

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> I'm sorry to inform you but... it seems that there is a similar problem
> in the NLM subsystem as reported previously but this time it is triggered
> when the grace time expires after a reboot.
>
> Client and server run 2.6.27.14 + previous fix, NFSv3.
>
> On the client there are three shells running:
>
> while :; do lck -w /mnt/foo 2; done
>
> The "lck" program is the same as posted before and it obtains an exclusive
> write lock then waits 2 seconds in above invocation (there's probably an
> "fcntl" command equivalent). After an orderly server reboot + grace time

How are you rebooting the server?

--b.

> expiration one of above command loops reports:
>
> lck: fcntl: No locks available
>
> and all three get stuck. After ^C-ing all "lck" loops the server still
> shows an entry in /proc/locks which causes the file to be locked
> indefinately. Maybe two loops are sufficient to reproduce the issue or
> maybe you need more, I don't know.
>
> Interestingly, during the grace time at least one of the "lck" processes
> should have re-obtained the lock but it didn't show up in /proc/locks
> on the server.
>
> Interestingly (#2), after removing the file on the server (i.e. no
> sillyrename) the now free inode is still locked according to /proc/locks.
> Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo
> 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter
> grace).
>
> --
> Frank
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2009-02-11 20:37:05

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > I'm sorry to inform you but... it seems that there is a similar problem
> > in the NLM subsystem as reported previously but this time it is triggered
> > when the grace time expires after a reboot.
> >
> > Client and server run 2.6.27.14 + previous fix, NFSv3.
> >
> > On the client there are three shells running:
> >
> > while :; do lck -w /mnt/foo 2; done
> >
> > The "lck" program is the same as posted before and it obtains an exclusive
> > write lock then waits 2 seconds in above invocation (there's probably an
> > "fcntl" command equivalent). After an orderly server reboot + grace time
>
> How are you rebooting the server?

"reboot"

>
> --b.
>
> > expiration one of above command loops reports:
> >
> > lck: fcntl: No locks available
> >
> > and all three get stuck. After ^C-ing all "lck" loops the server still
> > shows an entry in /proc/locks which causes the file to be locked
> > indefinately. Maybe two loops are sufficient to reproduce the issue or
> > maybe you need more, I don't know.
> >
> > Interestingly, during the grace time at least one of the "lck" processes
> > should have re-obtained the lock but it didn't show up in /proc/locks
> > on the server.
> >
> > Interestingly (#2), after removing the file on the server (i.e. no
> > sillyrename) the now free inode is still locked according to /proc/locks.
> > Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo
> > 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter
> > grace).
> >
> > --
> > Frank
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Frank

2009-02-11 20:39:40

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > I'm sorry to inform you but... it seems that there is a similar problem
> > > in the NLM subsystem as reported previously but this time it is triggered
> > > when the grace time expires after a reboot.
> > >
> > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > >
> > > On the client there are three shells running:
> > >
> > > while :; do lck -w /mnt/foo 2; done
> > >
> > > The "lck" program is the same as posted before and it obtains an exclusive
> > > write lock then waits 2 seconds in above invocation (there's probably an
> > > "fcntl" command equivalent). After an orderly server reboot + grace time
> >
> > How are you rebooting the server?
>
> "reboot"

Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
server is actually sending the reboot notification to the client, and
that the client is trying to reclaim? (Wireshark should make this all
fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and
send it to me if you're having trouble interpreting it.)

--b.

>
> >
> > --b.
> >
> > > expiration one of above command loops reports:
> > >
> > > lck: fcntl: No locks available
> > >
> > > and all three get stuck. After ^C-ing all "lck" loops the server still
> > > shows an entry in /proc/locks which causes the file to be locked
> > > indefinately. Maybe two loops are sufficient to reproduce the issue or
> > > maybe you need more, I don't know.
> > >
> > > Interestingly, during the grace time at least one of the "lck" processes
> > > should have re-obtained the lock but it didn't show up in /proc/locks
> > > on the server.
> > >
> > > Interestingly (#2), after removing the file on the server (i.e. no
> > > sillyrename) the now free inode is still locked according to /proc/locks.
> > > Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo
> > > 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter
> > > grace).
> > >
> > > --
> > > Frank
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Frank

2009-02-11 20:57:10

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > > I'm sorry to inform you but... it seems that there is a similar problem
> > > > in the NLM subsystem as reported previously but this time it is triggered
> > > > when the grace time expires after a reboot.
> > > >
> > > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > > >
> > > > On the client there are three shells running:
> > > >
> > > > while :; do lck -w /mnt/foo 2; done
> > > >
> > > > The "lck" program is the same as posted before and it obtains an exclusive
> > > > write lock then waits 2 seconds in above invocation (there's probably an
> > > > "fcntl" command equivalent). After an orderly server reboot + grace time
> > >
> > > How are you rebooting the server?
> >
> > "reboot"
>
> Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
> server is actually sending the reboot notification to the client, and
> that the client is trying to reclaim? (Wireshark should make this all
> fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and
> send it to me if you're having trouble interpreting it.)

Can't try it right now but tomorrow I can. However, I'm pretty sure at least
the reboot notification is there because:

1) The issue happens too in a totally different NFS server setup which
by definition invokes sm-notify in a script. This is the real use
case.
2) If not, then I would expect different behavior anyway compared to
what I saw. A lost reboot notification is always possible but in
that case the client(s) might end up holding more locks than the
server, not the other way around as it is right now.

I'll make a capture.

--
Frank

2009-02-12 14:28:33

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote:
> On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > > I'm sorry to inform you but... it seems that there is a similar problem
> > > > in the NLM subsystem as reported previously but this time it is triggered
> > > > when the grace time expires after a reboot.
> > > >
> > > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > > >
> > > > On the client there are three shells running:
> > > >
> > > > while :; do lck -w /mnt/foo 2; done
> > > >
> > > > The "lck" program is the same as posted before and it obtains an exclusive
> > > > write lock then waits 2 seconds in above invocation (there's probably an
> > > > "fcntl" command equivalent). After an orderly server reboot + grace time
> > >
> > > How are you rebooting the server?
> >
> > "reboot"
>
> Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
> server is actually sending the reboot notification to the client, and
> that the client is trying to reclaim? (Wireshark should make this all
> fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and
> send it to me if you're having trouble interpreting it.)

I have a capture with comment below. It raised so many questions
that I decided to do some more testing, trying to figure out how
it looks when the locking works. This issue now appears to predate the
fuse changes and is also present when both client and server run
2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 +
previous fix as discussed earlier. The full capture is available at
http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and
was started on the server as part of initscripts, right after the reboot
and filtered on client IP address.

Exported by wireshark (filter: nfs or stat or nlm) and condensed:

# time src prot
1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a
2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
5 0.000583 server: ICMP Destination unreachable (Port unreachable)
6 0.000589 server: ICMP Destination unreachable (Port unreachable)
7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
8 1.891320 server: ICMP Destination unreachable (Port unreachable)
9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
10 5.827119 server: ICMP Destination unreachable (Port unreachable)
11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
12 14.626587 server: ICMP Destination unreachable (Port unreachable)
15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
16 15.726505 server: ICMP Destination unreachable (Port unreachable)
17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
18 17.926368 server: ICMP Destination unreachable (Port unreachable)
25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
26 22.326090 server: ICMP Destination unreachable (Port unreachable)
35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0
36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD
37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0
38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0
39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD
40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD
41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0
43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a
45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0
47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48)
48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47)

Reboot notification ok.

51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0
52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0
53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0
54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD
55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD
56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD

The three contending client processes. I don't see a lock registration for
svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The
above goes on for a while. Neither the server or client shows any lock
in /proc/locks at this point.

166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0
167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0
168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD
169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD
170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0
171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD
172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0
173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0
174 120.030601 server: NLM V4 LOCK Reply (Call In 173)
175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED

This doesn't add up. There hasn't been a successful unlock for svid:114
(see #213 for that) but still one of the locks is granted.

176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0
177 120.030849 server: NLM V4 LOCK Reply (Call In 176)

Strange: an identical lock request but with a different rpc xid (i.e. no
packet duplication).

178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc
179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500
180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a
181 120.034030 server: NFS V3 ACCESS Reply (Call In 180)
182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc
183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0
184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c
185 120.034526 server: NFS V3 ACCESS Reply (Call In 184)
186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c
187 120.034776 server: NFS V3 ACCESS Reply (Call In 186)
188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest
189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc
190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc
191 120.035230 server: NFS V3 ACCESS Reply (Call In 190)
193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0
194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0
195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193)
197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED
201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0
202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201)
205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED
206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205)
207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc
208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500
209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0
210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED
213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0
214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213)
215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc
216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500
217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0
218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED
224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0
225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED
226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0
227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED
230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0
231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED

To recap the problem: one of the fcntl calls to obtain a write lock
returns

lck: fcntl: No locks available

shortly after the grace period expires. After that everything gets stuck,
server holding a write lock with no corresponding client side lock.


IMO looks like the client is to blame, even if/when the server
should/could have accepted UNLOCK during grace (I don't know, I'm not
an expert on that one).

--
Frank

2009-02-12 15:16:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 15:28 +0100, Frank van Maarseveen wrote:
> On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote:
> > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > > > I'm sorry to inform you but... it seems that there is a similar problem
> > > > > in the NLM subsystem as reported previously but this time it is triggered
> > > > > when the grace time expires after a reboot.
> > > > >
> > > > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > > > >
> > > > > On the client there are three shells running:
> > > > >
> > > > > while :; do lck -w /mnt/foo 2; done
> > > > >
> > > > > The "lck" program is the same as posted before and it obtains an exclusive
> > > > > write lock then waits 2 seconds in above invocation (there's probably an
> > > > > "fcntl" command equivalent). After an orderly server reboot + grace time
> > > >
> > > > How are you rebooting the server?
> > >
> > > "reboot"
> >
> > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
> > server is actually sending the reboot notification to the client, and
> > that the client is trying to reclaim? (Wireshark should make this all
> > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and
> > send it to me if you're having trouble interpreting it.)
>
> I have a capture with comment below. It raised so many questions
> that I decided to do some more testing, trying to figure out how
> it looks when the locking works. This issue now appears to predate the
> fuse changes and is also present when both client and server run
> 2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 +
> previous fix as discussed earlier. The full capture is available at
> http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and
> was started on the server as part of initscripts, right after the reboot
> and filtered on client IP address.
>
> Exported by wireshark (filter: nfs or stat or nlm) and condensed:
>
> # time src prot
> 1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 5 0.000583 server: ICMP Destination unreachable (Port unreachable)
> 6 0.000589 server: ICMP Destination unreachable (Port unreachable)
> 7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 8 1.891320 server: ICMP Destination unreachable (Port unreachable)
> 9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 10 5.827119 server: ICMP Destination unreachable (Port unreachable)
> 11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 12 14.626587 server: ICMP Destination unreachable (Port unreachable)
> 15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 16 15.726505 server: ICMP Destination unreachable (Port unreachable)
> 17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 18 17.926368 server: ICMP Destination unreachable (Port unreachable)
> 25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 26 22.326090 server: ICMP Destination unreachable (Port unreachable)
> 35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0
> 36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD
> 37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0
> 38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0
> 39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD
> 40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD
> 41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> 42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0
> 43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a
> 45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0
> 47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48)
> 48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47)
>
> Reboot notification ok.
>
> 51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0
> 52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0
> 53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0
> 54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD
> 55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD
> 56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD
>
> The three contending client processes. I don't see a lock registration for
> svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The
> above goes on for a while. Neither the server or client shows any lock
> in /proc/locks at this point.
>
> 166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0
> 167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0
> 168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD
> 169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD
> 170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0
> 171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD
> 172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0
> 173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0
> 174 120.030601 server: NLM V4 LOCK Reply (Call In 173)
> 175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED
>
> This doesn't add up. There hasn't been a successful unlock for svid:114
> (see #213 for that) but still one of the locks is granted.

Has the lock for svid:114 been attempted recovered by the client? If
not, then the server has no knowledge of that lock.

> 176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0
> 177 120.030849 server: NLM V4 LOCK Reply (Call In 176)
>
> Strange: an identical lock request but with a different rpc xid (i.e. no
> packet duplication).

No. That would be the non-blocking lock that is intended as a 'ping' to
see if the server is still alive. It duplicates the blocking lock in all
details except that the 'block' flag is not set.

> 178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc
> 179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500
> 180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a
> 181 120.034030 server: NFS V3 ACCESS Reply (Call In 180)
> 182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc
> 183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0
> 184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c
> 185 120.034526 server: NFS V3 ACCESS Reply (Call In 184)
> 186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c
> 187 120.034776 server: NFS V3 ACCESS Reply (Call In 186)
> 188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest
> 189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc
> 190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc
> 191 120.035230 server: NFS V3 ACCESS Reply (Call In 190)
> 193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0
> 194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0
> 195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193)
> 197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED
> 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0
> 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201)
> 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED
> 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205)

What happened here? Why did the client refuse the lock for svid 116?

Did the task get signalled? If so, where is the CANCEL request?

> 207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc
> 208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500
> 209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0
> 210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED
> 213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0
> 214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213)
> 215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc
> 216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500
> 217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0
> 218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED
> 224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0
> 225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED
> 226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0
> 227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED
> 230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0
> 231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED
>
> To recap the problem: one of the fcntl calls to obtain a write lock
> returns
>
> lck: fcntl: No locks available
>
> shortly after the grace period expires. After that everything gets stuck,
> server holding a write lock with no corresponding client side lock.
>
>
> IMO looks like the client is to blame, even if/when the server
> should/could have accepted UNLOCK during grace (I don't know, I'm not
> an expert on that one).

Possibly... It depends entirely on what happened to cause it to deny the
GRANTED callback...

Trond


2009-02-12 15:36:37

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, Feb 12, 2009 at 10:16:29AM -0500, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 15:28 +0100, Frank van Maarseveen wrote:
> > On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote:
> > > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote:
> > > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote:
> > > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote:
> > > > > > I'm sorry to inform you but... it seems that there is a similar problem
> > > > > > in the NLM subsystem as reported previously but this time it is triggered
> > > > > > when the grace time expires after a reboot.
> > > > > >
> > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3.
> > > > > >
> > > > > > On the client there are three shells running:
> > > > > >
> > > > > > while :; do lck -w /mnt/foo 2; done
> > > > > >
> > > > > > The "lck" program is the same as posted before and it obtains an exclusive
> > > > > > write lock then waits 2 seconds in above invocation (there's probably an
> > > > > > "fcntl" command equivalent). After an orderly server reboot + grace time
> > > > >
> > > > > How are you rebooting the server?
> > > >
> > > > "reboot"
> > >
> > > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the
> > > server is actually sending the reboot notification to the client, and
> > > that the client is trying to reclaim? (Wireshark should make this all
> > > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and
> > > send it to me if you're having trouble interpreting it.)
> >
> > I have a capture with comment below. It raised so many questions
> > that I decided to do some more testing, trying to figure out how
> > it looks when the locking works. This issue now appears to predate the
> > fuse changes and is also present when both client and server run
> > 2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 +
> > previous fix as discussed earlier. The full capture is available at
> > http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and
> > was started on the server as part of initscripts, right after the reboot
> > and filtered on client IP address.
> >
> > Exported by wireshark (filter: nfs or stat or nlm) and condensed:
> >
> > # time src prot
> > 1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 5 0.000583 server: ICMP Destination unreachable (Port unreachable)
> > 6 0.000589 server: ICMP Destination unreachable (Port unreachable)
> > 7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 8 1.891320 server: ICMP Destination unreachable (Port unreachable)
> > 9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 10 5.827119 server: ICMP Destination unreachable (Port unreachable)
> > 11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 12 14.626587 server: ICMP Destination unreachable (Port unreachable)
> > 15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 16 15.726505 server: ICMP Destination unreachable (Port unreachable)
> > 17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 18 17.926368 server: ICMP Destination unreachable (Port unreachable)
> > 25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 26 22.326090 server: ICMP Destination unreachable (Port unreachable)
> > 35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0
> > 36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD
> > 37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0
> > 38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0
> > 39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD
> > 40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD
> > 41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a
> > 42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0
> > 43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a
> > 45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0
> > 47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48)
> > 48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47)
> >
> > Reboot notification ok.
> >
> > 51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0
> > 52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0
> > 53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0
> > 54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD
> > 55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD
> > 56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD
> >
> > The three contending client processes. I don't see a lock registration for
> > svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The
> > above goes on for a while. Neither the server or client shows any lock
> > in /proc/locks at this point.
> >
> > 166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0
> > 167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0
> > 168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD
> > 169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD
> > 170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0
> > 171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD
> > 172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0
> > 173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0
> > 174 120.030601 server: NLM V4 LOCK Reply (Call In 173)
> > 175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED
> >
> > This doesn't add up. There hasn't been a successful unlock for svid:114
> > (see #213 for that) but still one of the locks is granted.
>
> Has the lock for svid:114 been attempted recovered by the client? If
> not, then the server has no knowledge of that lock.

exactly. Apparently the client tries to unlock an unrecovered lock.

>
> > 176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0
> > 177 120.030849 server: NLM V4 LOCK Reply (Call In 176)
> >
> > Strange: an identical lock request but with a different rpc xid (i.e. no
> > packet duplication).
>
> No. That would be the non-blocking lock that is intended as a 'ping' to
> see if the server is still alive. It duplicates the blocking lock in all
> details except that the 'block' flag is not set.
>
> > 178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc
> > 179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500
> > 180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a
> > 181 120.034030 server: NFS V3 ACCESS Reply (Call In 180)
> > 182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc
> > 183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0
> > 184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c
> > 185 120.034526 server: NFS V3 ACCESS Reply (Call In 184)
> > 186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c
> > 187 120.034776 server: NFS V3 ACCESS Reply (Call In 186)
> > 188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest
> > 189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc
> > 190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc
> > 191 120.035230 server: NFS V3 ACCESS Reply (Call In 190)
> > 193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0
> > 194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0
> > 195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193)
> > 197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED
> > 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0
> > 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201)
> > 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED
> > 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205)
>
> What happened here? Why did the client refuse the lock for svid 116?
>
> Did the task get signalled? If so, where is the CANCEL request?

The task did not get signaled, there is no CANCEL.

>
> > 207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc
> > 208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500
> > 209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0
> > 210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED
> > 213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0
> > 214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213)
> > 215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc
> > 216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500
> > 217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0
> > 218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED
> > 224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0
> > 225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED
> > 226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0
> > 227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED
> > 230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0
> > 231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED
> >
> > To recap the problem: one of the fcntl calls to obtain a write lock
> > returns
> >
> > lck: fcntl: No locks available
> >
> > shortly after the grace period expires. After that everything gets stuck,
> > server holding a write lock with no corresponding client side lock.
> >
> >
> > IMO looks like the client is to blame, even if/when the server
> > should/could have accepted UNLOCK during grace (I don't know, I'm not
> > an expert on that one).
>
> Possibly... It depends entirely on what happened to cause it to deny the
> GRANTED callback...

A little theorizing:
If the unlock of a yet unrecovered lock has failed up to that point then
the client sure must remember the lock somehow. That might explain the
secondary error when a conflicting lock is granted by the server.

--
Frank

2009-02-12 18:17:33

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
> A little theorizing:
> If the unlock of a yet unrecovered lock has failed up to that point then
> the client sure must remember the lock somehow. That might explain the
> secondary error when a conflicting lock is granted by the server.

Sorry, but that doesn't hold water. The client will release the VFS
'mirror' of the lock before it attempts to unlock. Otherwise, you could
have some nasty races between the unlock thread and the recovery
thread...
Besides, the granted callback handler on the client only checks the list
of blocked locks for a match.

Oh, bugger, I know what this is... It's the same thing that happened to
the NFSv4 callback server. If you compile with CONFIG_IPV6 or
CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then
the NLM server will listen on an IPv6 socket, and so the RPC request
come in with their IPv4 address mapped into the IPv6 namespace.
The client, on the other hand, is using an IPv4 socket, 'cos you
specified an IPv4 address to the mount command.
The result is that the call to nlm_cmp_addr() in nlmclnt_grant() always
fails...

Basically, we need to replace nlm_cmp_addr() with something akin to
nfs_sockaddr_match_ipaddr(), which will compare v4 mapped addresses.

The workaround should be simply to turn off CONFIG_SUNRPC_REGISTER_V4 if
you're not planning on ever using NFS-over-IPv6...

Cheers
Trond


2009-02-12 18:29:45

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
> > A little theorizing:
> > If the unlock of a yet unrecovered lock has failed up to that point then
> > the client sure must remember the lock somehow. That might explain the
> > secondary error when a conflicting lock is granted by the server.
>
> Sorry, but that doesn't hold water. The client will release the VFS
> 'mirror' of the lock before it attempts to unlock. Otherwise, you could
> have some nasty races between the unlock thread and the recovery
> thread...
> Besides, the granted callback handler on the client only checks the list
> of blocked locks for a match.

ok, then we have more than one NLM bug to resolve.

>
> Oh, bugger, I know what this is... It's the same thing that happened to
> the NFSv4 callback server. If you compile with CONFIG_IPV6 or
> CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then
> the NLM server will listen on an IPv6 socket, and so the RPC request
> come in with their IPv4 address mapped into the IPv6 namespace.

Nope:

$ zgrep IPV6 /proc/config.gz
# CONFIG_IPV6 is not set
$ zgrep SUNRPC /proc/config.gz
CONFIG_SUNRPC=y
CONFIG_SUNRPC_GSS=y
# CONFIG_SUNRPC_BIND34 is not set


And remember this is not a recent regression.

--
Frank

2009-02-12 19:10:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote:
> On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote:
> > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
> > > A little theorizing:
> > > If the unlock of a yet unrecovered lock has failed up to that point then
> > > the client sure must remember the lock somehow. That might explain the
> > > secondary error when a conflicting lock is granted by the server.
> >
> > Sorry, but that doesn't hold water. The client will release the VFS
> > 'mirror' of the lock before it attempts to unlock. Otherwise, you could
> > have some nasty races between the unlock thread and the recovery
> > thread...
> > Besides, the granted callback handler on the client only checks the list
> > of blocked locks for a match.
>
> ok, then we have more than one NLM bug to resolve.
>
> >
> > Oh, bugger, I know what this is... It's the same thing that happened to
> > the NFSv4 callback server. If you compile with CONFIG_IPV6 or
> > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then
> > the NLM server will listen on an IPv6 socket, and so the RPC request
> > come in with their IPv4 address mapped into the IPv6 namespace.
>
> Nope:
>
> $ zgrep IPV6 /proc/config.gz
> # CONFIG_IPV6 is not set
> $ zgrep SUNRPC /proc/config.gz
> CONFIG_SUNRPC=y
> CONFIG_SUNRPC_GSS=y
> # CONFIG_SUNRPC_BIND34 is not set

Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is
specific to 2.6.29. Chuck, are you planning on fixing this before
2.6.29-final comes out?

> And remember this is not a recent regression.

It would help if you sent us the full binary tcpdump, instead of just
the summary. That should enable us to figure out which of the tests is
failing in nlmclnt_grant().

Trond


2009-02-12 19:16:11

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, Feb 12, 2009 at 02:10:37PM -0500, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote:
> > On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote:
> > > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
> > > > A little theorizing:
> > > > If the unlock of a yet unrecovered lock has failed up to that point then
> > > > the client sure must remember the lock somehow. That might explain the
> > > > secondary error when a conflicting lock is granted by the server.
> > >
> > > Sorry, but that doesn't hold water. The client will release the VFS
> > > 'mirror' of the lock before it attempts to unlock. Otherwise, you could
> > > have some nasty races between the unlock thread and the recovery
> > > thread...
> > > Besides, the granted callback handler on the client only checks the list
> > > of blocked locks for a match.
> >
> > ok, then we have more than one NLM bug to resolve.
> >
> > >
> > > Oh, bugger, I know what this is... It's the same thing that happened to
> > > the NFSv4 callback server. If you compile with CONFIG_IPV6 or
> > > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then
> > > the NLM server will listen on an IPv6 socket, and so the RPC request
> > > come in with their IPv4 address mapped into the IPv6 namespace.
> >
> > Nope:
> >
> > $ zgrep IPV6 /proc/config.gz
> > # CONFIG_IPV6 is not set
> > $ zgrep SUNRPC /proc/config.gz
> > CONFIG_SUNRPC=y
> > CONFIG_SUNRPC_GSS=y
> > # CONFIG_SUNRPC_BIND34 is not set
>
> Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is
> specific to 2.6.29. Chuck, are you planning on fixing this before
> 2.6.29-final comes out?
>
> > And remember this is not a recent regression.
>
> It would help if you sent us the full binary tcpdump, instead of just
> the summary. That should enable us to figure out which of the tests is
> failing in nlmclnt_grant().

I posted the link already. Anyway, see attachment.

--
Frank


Attachments:
(No filename) (1.96 kB)
2.6.27.14-nlm-grace.pcap (32.92 kB)
Download all attachments

2009-02-12 19:35:38

by Chuck Lever

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Feb 12, 2009, at 2:10 PM, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote:
>> On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote:
>>> On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
>>>> A little theorizing:
>>>> If the unlock of a yet unrecovered lock has failed up to that
>>>> point then
>>>> the client sure must remember the lock somehow. That might
>>>> explain the
>>>> secondary error when a conflicting lock is granted by the server.
>>>
>>> Sorry, but that doesn't hold water. The client will release the VFS
>>> 'mirror' of the lock before it attempts to unlock. Otherwise, you
>>> could
>>> have some nasty races between the unlock thread and the recovery
>>> thread...
>>> Besides, the granted callback handler on the client only checks
>>> the list
>>> of blocked locks for a match.
>>
>> ok, then we have more than one NLM bug to resolve.
>>
>>>
>>> Oh, bugger, I know what this is... It's the same thing that
>>> happened to
>>> the NFSv4 callback server. If you compile with CONFIG_IPV6 or
>>> CONFIG_IPV6_MODULE enabled, and also set
>>> CONFIG_SUNRPC_REGISTER_V4, then
>>> the NLM server will listen on an IPv6 socket, and so the RPC request
>>> come in with their IPv4 address mapped into the IPv6 namespace.
>>
>> Nope:
>>
>> $ zgrep IPV6 /proc/config.gz
>> # CONFIG_IPV6 is not set
>> $ zgrep SUNRPC /proc/config.gz
>> CONFIG_SUNRPC=y
>> CONFIG_SUNRPC_GSS=y
>> # CONFIG_SUNRPC_BIND34 is not set
>
> Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses
> bug is
> specific to 2.6.29. Chuck, are you planning on fixing this before
> 2.6.29-final comes out?

I wasn't sure exactly where the compared addresses came from. I had
assumed that they all came through the listener, so we wouldn't need
this kind of translation. It shouldn't be difficult to map addresses
passed in via nlmclnt_init() to AF_INET6.

But this is the kind of thing that makes "falling back" to an AF_INET
listener a little challenging. We will have to record what flavor the
listener is and do a translation depending on what listener family was
actually created.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-12 19:43:22

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
> I wasn't sure exactly where the compared addresses came from. I had
> assumed that they all came through the listener, so we wouldn't need
> this kind of translation. It shouldn't be difficult to map addresses
> passed in via nlmclnt_init() to AF_INET6.
>
> But this is the kind of thing that makes "falling back" to an AF_INET
> listener a little challenging. We will have to record what flavor the
> listener is and do a translation depending on what listener family was
> actually created.

Why? Should we care whether we're receiving IPv4 addresses or IPv6
v4-mapped addresses? They're the same thing...

We're already doing the mapping for the NFSv4 callback channel. See
nfs_sockaddr_match_ipaddr() in fs/nfs/client.c

Trond


2009-02-12 20:11:58

by Chuck Lever

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
>> I wasn't sure exactly where the compared addresses came from. I had
>> assumed that they all came through the listener, so we wouldn't need
>> this kind of translation. It shouldn't be difficult to map addresses
>> passed in via nlmclnt_init() to AF_INET6.
>>
>> But this is the kind of thing that makes "falling back" to an AF_INET
>> listener a little challenging. We will have to record what flavor
>> the
>> listener is and do a translation depending on what listener family
>> was
>> actually created.
>
> Why? Should we care whether we're receiving IPv4 addresses or IPv6
> v4-mapped addresses? They're the same thing...

The problem is the listener family is now decided at run-time. If an
AF_INET6 listener can't be created, an AF_INET listener is created
instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an
AF_INET listener is created, we get only IPv4 addresses in svc_rqst-
>rq_addr.

So we can do it either way. Taking lockd as an example:

1. Have nlmclnt_init() map AF_INET mount addresses to AF_INET6 iff
the lockd listener is AF_INET6, so nlm_cmp_addr() is always dealing
with AF_INET6 in this case, or

2. If CONFIG_IPV6 || CONFIG_IPV6_MODULE, unconditionally map AF_INET
addresses in nlmclnt_init and for incoming NLM requests (when lockd
happens to have fallen back to an AF_INET listener)

Personally I think solution 1. will be less confusing operationally
and less invasive code-wise. I suppose IPv6 purists would prefer
keeping the whole stack in AF_INET6, so they would like solution 2.

Eventually we could map incoming addresses on AF_INET listeners in the
RPC server code, but I prefer to wait until all kernel RPC services
have IPv6 support.

Since 2.6.29 has the CONFIG_SUNRPC_REGISTER_V4=N workaround, do we
need to fix 2.6.29, or can this wait until 2.6.30?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-12 20:24:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 20:16 +0100, Frank van Maarseveen wrote:
> On Thu, Feb 12, 2009 at 02:10:37PM -0500, Trond Myklebust wrote:
> > On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote:
> > > On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote:
> > > > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote:
> > > > > A little theorizing:
> > > > > If the unlock of a yet unrecovered lock has failed up to that point then
> > > > > the client sure must remember the lock somehow. That might explain the
> > > > > secondary error when a conflicting lock is granted by the server.
> > > >
> > > > Sorry, but that doesn't hold water. The client will release the VFS
> > > > 'mirror' of the lock before it attempts to unlock. Otherwise, you could
> > > > have some nasty races between the unlock thread and the recovery
> > > > thread...
> > > > Besides, the granted callback handler on the client only checks the list
> > > > of blocked locks for a match.
> > >
> > > ok, then we have more than one NLM bug to resolve.
> > >
> > > >
> > > > Oh, bugger, I know what this is... It's the same thing that happened to
> > > > the NFSv4 callback server. If you compile with CONFIG_IPV6 or
> > > > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then
> > > > the NLM server will listen on an IPv6 socket, and so the RPC request
> > > > come in with their IPv4 address mapped into the IPv6 namespace.
> > >
> > > Nope:
> > >
> > > $ zgrep IPV6 /proc/config.gz
> > > # CONFIG_IPV6 is not set
> > > $ zgrep SUNRPC /proc/config.gz
> > > CONFIG_SUNRPC=y
> > > CONFIG_SUNRPC_GSS=y
> > > # CONFIG_SUNRPC_BIND34 is not set
> >
> > Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is
> > specific to 2.6.29. Chuck, are you planning on fixing this before
> > 2.6.29-final comes out?
> >
> > > And remember this is not a recent regression.
> >
> > It would help if you sent us the full binary tcpdump, instead of just
> > the summary. That should enable us to figure out which of the tests is
> > failing in nlmclnt_grant().
>
> I posted the link already. Anyway, see attachment.

Yeah... It looks alright. The one thing that looks a bit odd is the
GRANTED lock has a 'caller_name' field that is set to the name of the
server. I pretty sure we don't care about that, though...

Hmm... I wonder if the problem isn't just that we're failing to cancel
the lock request when the process is signalled. Can you try the
following patch?

--------------------------------------------------------------------
From: Trond Myklebust <[email protected]>
NLM/lockd: Always cancel blocked locks when exiting early from nlmclnt_lock

Signed-off-by: Trond Myklebust <[email protected]>
---

fs/lockd/clntproc.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)


diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index 31668b6..f956d1e 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -542,9 +542,14 @@ again:
status = nlmclnt_call(cred, req, NLMPROC_LOCK);
if (status < 0)
break;
- /* Did a reclaimer thread notify us of a server reboot? */
- if (resp->status == nlm_lck_denied_grace_period)
+ /* Is the server in a grace period state?
+ * If so, we need to reset the resp->status, and
+ * retry...
+ */
+ if (resp->status == nlm_lck_denied_grace_period) {
+ resp->status = nlm_lck_blocked;
continue;
+ }
if (resp->status != nlm_lck_blocked)
break;
/* Wait on an NLM blocking lock */



2009-02-12 20:27:43

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote:
> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote:
> > On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
> >> I wasn't sure exactly where the compared addresses came from. I had
> >> assumed that they all came through the listener, so we wouldn't need
> >> this kind of translation. It shouldn't be difficult to map addresses
> >> passed in via nlmclnt_init() to AF_INET6.
> >>
> >> But this is the kind of thing that makes "falling back" to an AF_INET
> >> listener a little challenging. We will have to record what flavor
> >> the
> >> listener is and do a translation depending on what listener family
> >> was
> >> actually created.
> >
> > Why? Should we care whether we're receiving IPv4 addresses or IPv6
> > v4-mapped addresses? They're the same thing...
>
> The problem is the listener family is now decided at run-time. If an
> AF_INET6 listener can't be created, an AF_INET listener is created
> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an
> AF_INET listener is created, we get only IPv4 addresses in svc_rqst-
> >rq_addr.

You're missing my point. Why should we care if it's one or the other? In
the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it
turns out that CONFIG_IPV6 is enabled.

IOW: we always compare IPv6 addresses.

Trond


2009-02-12 20:43:28

by Chuck Lever

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote:
>> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote:
>>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
>>>> I wasn't sure exactly where the compared addresses came from. I
>>>> had
>>>> assumed that they all came through the listener, so we wouldn't
>>>> need
>>>> this kind of translation. It shouldn't be difficult to map
>>>> addresses
>>>> passed in via nlmclnt_init() to AF_INET6.
>>>>
>>>> But this is the kind of thing that makes "falling back" to an
>>>> AF_INET
>>>> listener a little challenging. We will have to record what flavor
>>>> the
>>>> listener is and do a translation depending on what listener family
>>>> was
>>>> actually created.
>>>
>>> Why? Should we care whether we're receiving IPv4 addresses or IPv6
>>> v4-mapped addresses? They're the same thing...
>>
>> The problem is the listener family is now decided at run-time. If an
>> AF_INET6 listener can't be created, an AF_INET listener is created
>> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an
>> AF_INET listener is created, we get only IPv4 addresses in svc_rqst-
>>> rq_addr.
>
> You're missing my point. Why should we care if it's one or the
> other? In
> the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it
> turns out that CONFIG_IPV6 is enabled.
>
> IOW: we always compare IPv6 addresses.

The reason we might care in this case is nlm_cmp_addr() is executed
more frequently than nfs_sockaddr_match_ipaddr().

Mapping the server address in nlmclnt_init() means we translate the
server address once and are done with it. We never have to map
incoming AF_INET addresses in NLM requests, and we don't have the
extra conditionals every time we go through nlm_cmp_addr().

This keeps nlm_cmp_addr() as simple as it can be: it compares only two
AF_INET addresses or two AF_INET6 addresses.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-12 20:54:50

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote:
> On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote:
> > On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote:
> >> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote:
> >>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
> >>>> I wasn't sure exactly where the compared addresses came from. I
> >>>> had
> >>>> assumed that they all came through the listener, so we wouldn't
> >>>> need
> >>>> this kind of translation. It shouldn't be difficult to map
> >>>> addresses
> >>>> passed in via nlmclnt_init() to AF_INET6.
> >>>>
> >>>> But this is the kind of thing that makes "falling back" to an
> >>>> AF_INET
> >>>> listener a little challenging. We will have to record what flavor
> >>>> the
> >>>> listener is and do a translation depending on what listener family
> >>>> was
> >>>> actually created.
> >>>
> >>> Why? Should we care whether we're receiving IPv4 addresses or IPv6
> >>> v4-mapped addresses? They're the same thing...
> >>
> >> The problem is the listener family is now decided at run-time. If an
> >> AF_INET6 listener can't be created, an AF_INET listener is created
> >> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an
> >> AF_INET listener is created, we get only IPv4 addresses in svc_rqst-
> >>> rq_addr.
> >
> > You're missing my point. Why should we care if it's one or the
> > other? In
> > the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it
> > turns out that CONFIG_IPV6 is enabled.
> >
> > IOW: we always compare IPv6 addresses.
>
> The reason we might care in this case is nlm_cmp_addr() is executed
> more frequently than nfs_sockaddr_match_ipaddr().
>
> Mapping the server address in nlmclnt_init() means we translate the
> server address once and are done with it. We never have to map
> incoming AF_INET addresses in NLM requests, and we don't have the
> extra conditionals every time we go through nlm_cmp_addr().
>
> This keeps nlm_cmp_addr() as simple as it can be: it compares only two
> AF_INET addresses or two AF_INET6 addresses.

I don't see how that changes the general principle. All it means is that
you should be caching v4 mapped addresses instead of ipv4 addresses.
That would allow you to simplify nlm_cmp_addr() even further...

Trond


2009-02-12 21:44:08

by Chuck Lever

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Feb 12, 2009, at 3:54 PM, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote:
>> On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote:
>>> On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote:
>>>> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote:
>>>>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote:
>>>>>> I wasn't sure exactly where the compared addresses came from. I
>>>>>> had
>>>>>> assumed that they all came through the listener, so we wouldn't
>>>>>> need
>>>>>> this kind of translation. It shouldn't be difficult to map
>>>>>> addresses
>>>>>> passed in via nlmclnt_init() to AF_INET6.
>>>>>>
>>>>>> But this is the kind of thing that makes "falling back" to an
>>>>>> AF_INET
>>>>>> listener a little challenging. We will have to record what
>>>>>> flavor
>>>>>> the
>>>>>> listener is and do a translation depending on what listener
>>>>>> family
>>>>>> was
>>>>>> actually created.
>>>>>
>>>>> Why? Should we care whether we're receiving IPv4 addresses or IPv6
>>>>> v4-mapped addresses? They're the same thing...
>>>>
>>>> The problem is the listener family is now decided at run-time.
>>>> If an
>>>> AF_INET6 listener can't be created, an AF_INET listener is created
>>>> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled.
>>>> If an
>>>> AF_INET listener is created, we get only IPv4 addresses in
>>>> svc_rqst-
>>>>> rq_addr.
>>>
>>> You're missing my point. Why should we care if it's one or the
>>> other? In
>>> the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it
>>> turns out that CONFIG_IPV6 is enabled.
>>>
>>> IOW: we always compare IPv6 addresses.
>>
>> The reason we might care in this case is nlm_cmp_addr() is executed
>> more frequently than nfs_sockaddr_match_ipaddr().
>>
>> Mapping the server address in nlmclnt_init() means we translate the
>> server address once and are done with it. We never have to map
>> incoming AF_INET addresses in NLM requests, and we don't have the
>> extra conditionals every time we go through nlm_cmp_addr().
>>
>> This keeps nlm_cmp_addr() as simple as it can be: it compares only
>> two
>> AF_INET addresses or two AF_INET6 addresses.
>
> I don't see how that changes the general principle. All it means is
> that
> you should be caching v4 mapped addresses instead of ipv4 addresses.
> That would allow you to simplify nlm_cmp_addr() even further...

Operationally we have to support both AF_INET and AF_INET6 addresses
in the cache, because we don't know what kind of lockd listener can be
created until runtime. So, I can't see how we can eliminate the
AF_INET arm in nlm_cmp_addr() unless we unconditionally convert all
incoming AF_INET addresses from putative PF_INET listeners _and_
convert incoming IPv4 server addresses in NFS mount requests to
AF_INET6.

Doesn't that add computational overhead to a fairly common case?

This goes away if we ensure that the address family of the server
address passed to nlmclnt_lookup_host() always matches the protocol
family of lockd's listener sockets. Then address mapping overhead is
entirely removed from the common cases involving PF_INET listeners.

For PF_INET6 listeners, incoming IPv4 addresses are already mapped by
the underlying network layer. Nothing can be done about that. But we
can make sure the address family of the server address passed to
nlmclnt_lookup_host() matches the incoming mapped addresses to
eliminate the need for nlm_cmp_addr() to do the mapping every time it
wants to compare an address.

It should be fairly simple to record the listener's protocol family,
check it against incoming server addresses in nlmclnt_init(), then map
the address as needed.

Having nlm_cmp_addr() do the mapping solves some problems, but at the
cost of extra CPU time every time it is called; each loop iteration in
nlm_lookup_host() for example. All I'm doing is removing a loop
invariant, essentially.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-12 22:02:23

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote:
> The reason we might care in this case is nlm_cmp_addr() is executed
> more frequently than nfs_sockaddr_match_ipaddr().

Actually, I'm not sure this assertion is correct. The only users of
nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and
nlmsvc_unlock_all_by_ip().

AFAICS, the only one that needs to be v4 mapped should be nlmclnt_grant,
which is not in a performance critical path...



2009-02-12 22:03:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 16:43 -0500, Chuck Lever wrote:
> Having nlm_cmp_addr() do the mapping solves some problems, but at the
> cost of extra CPU time every time it is called; each loop iteration in
> nlm_lookup_host() for example. All I'm doing is removing a loop
> invariant, essentially.

nlm_lookup_host() shouldn't need to compare v4 mapped addresses and IPv4
addresses afaics.



2009-02-12 22:11:36

by Chuck Lever

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Feb 12, 2009, at 5:02 PM, Trond Myklebust wrote:
> On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote:
>> The reason we might care in this case is nlm_cmp_addr() is executed
>> more frequently than nfs_sockaddr_match_ipaddr().
>
> Actually, I'm not sure this assertion is correct. The only users of
> nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and
> nlmsvc_unlock_all_by_ip().
>
> AFAICS, the only one that needs to be v4 mapped should be
> nlmclnt_grant,
> which is not in a performance critical path...

So then your proposal is to ensure the two arguments of the
nlm_cmp_addr() callsite in nlmclnt_grant() are both AF_INET6?

That doesn't sound so bad.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2009-02-12 22:19:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, 2009-02-12 at 17:11 -0500, Chuck Lever wrote:
> On Feb 12, 2009, at 5:02 PM, Trond Myklebust wrote:
> > On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote:
> >> The reason we might care in this case is nlm_cmp_addr() is executed
> >> more frequently than nfs_sockaddr_match_ipaddr().
> >
> > Actually, I'm not sure this assertion is correct. The only users of
> > nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and
> > nlmsvc_unlock_all_by_ip().
> >
> > AFAICS, the only one that needs to be v4 mapped should be
> > nlmclnt_grant,
> > which is not in a performance critical path...
>
> So then your proposal is to ensure the two arguments of the
> nlm_cmp_addr() callsite in nlmclnt_grant() are both AF_INET6?

Yup... I can't see that the other two callsites need anything like that.



2009-02-13 11:04:58

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27.14 breakage when grace period expires

On Thu, Feb 12, 2009 at 03:24:11PM -0500, Trond Myklebust wrote:
>
> Hmm... I wonder if the problem isn't just that we're failing to cancel
> the lock request when the process is signalled. Can you try the
> following patch?
>
> --------------------------------------------------------------------
> From: Trond Myklebust <[email protected]>
> NLM/lockd: Always cancel blocked locks when exiting early from nlmclnt_lock
>
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
>
> fs/lockd/clntproc.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
>
> diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> index 31668b6..f956d1e 100644
> --- a/fs/lockd/clntproc.c
> +++ b/fs/lockd/clntproc.c
> @@ -542,9 +542,14 @@ again:
> status = nlmclnt_call(cred, req, NLMPROC_LOCK);
> if (status < 0)
> break;
> - /* Did a reclaimer thread notify us of a server reboot? */
> - if (resp->status == nlm_lck_denied_grace_period)
> + /* Is the server in a grace period state?
> + * If so, we need to reset the resp->status, and
> + * retry...
> + */
> + if (resp->status == nlm_lck_denied_grace_period) {
> + resp->status = nlm_lck_blocked;
> continue;
> + }
> if (resp->status != nlm_lck_blocked)
> break;
> /* Wait on an NLM blocking lock */

Patch tried but didn't make any difference. Note that there isn't any ^C
or any other signal involved. The client runs three loops in the shell

while :; do lck -w /mnt/locktest 2; done

and every "lck" opens the file, obtains an exclusive write lock (waits
if necessary), calls sleep(2), closes the fd (releasing the lock) and
goes exit.

The "lck" which ends up unlocking during grace terminates normally but
one of the others gets a "fcntl: No locks available" when trying to
obtain the lock.


Question: shouldn't the server drop the lock after a sequence like:

201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0
202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201)
205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED
206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205)

?

--
Frank