2010-06-22 19:03:43

by Chuck Lever III

[permalink] [raw]
Subject: Connectathon locking test fails over NFSv3 with EBUSY

It looks like the connectathon tests race with the removal of deleted
files. The actual lock test is successful, but when the scripts attempt
to reset the test directory for another pass, the RMDIR fails because
the directory is full of ".nfsxxx" files.

Seems like RMDIR should wait for those silly deletes before trying to
remove the parent directory.

I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens
nearly every time.


Test #15 - Test 2nd open and I/O after lock and close.
Parent: Second open succeeded.
Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED.
Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED.
Parent: Closed testfile.
Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
Parent: 15.2 - COMPARE [ 0, b] PASSED.

** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).

** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
Congratulations, you passed the locking tests!
... Pass 2 ...

rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000d8e00000041': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df100000050': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dfb0000004a': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dec00000047': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df90000004b': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dfa0000004e': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df80000004f': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df20000004c': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000deb00000051': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000def00000048': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dea0000004d': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000de900000049': Device or
resource busy
Starting BASIC tests: test directory /mnt/klimt/ellison.test (arg: -t)
mkdir: cannot create directory `/mnt/klimt/ellison.test': File exists

./test1: File and directory creation test
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000d8e00000041': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df100000050': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dfb0000004a': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dec00000047': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df90000004b': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dfa0000004e': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df80000004f': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000df20000004c': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000deb00000051': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000def00000048': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000dea0000004d': Device or
resource busy
rm: cannot remove
`/mnt/klimt/ellison.test/.nfs0000000000000de900000049': Device or
resource busy
./test1: (/home/cel/src/cthon04/basic) can't remove old test directory
/mnt/klimt/ellison.test
basic tests failed
Tests failed, leaving /mnt/klimt mounted
[cel@ellison cthon04]$

--
Chuck Lever


2010-06-23 17:51:25

by Chuck Lever III

[permalink] [raw]
Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY

On 06/22/10 03:17 PM, Trond Myklebust wrote:
> On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote:
>> It looks like the connectathon tests race with the removal of deleted
>> files. The actual lock test is successful, but when the scripts attempt
>> to reset the test directory for another pass, the RMDIR fails because
>> the directory is full of ".nfsxxx" files.
>>
>> Seems like RMDIR should wait for those silly deletes before trying to
>> remove the parent directory.
>>
>> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens
>> nearly every time.
>>
>>
>> Test #15 - Test 2nd open and I/O after lock and close.
>> Parent: Second open succeeded.
>> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED.
>> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED.
>> Parent: Closed testfile.
>> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
>> Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
>> Parent: 15.2 - COMPARE [ 0, b] PASSED.
>>
>> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).
>>
>> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
>> Congratulations, you passed the locking tests!
>> ... Pass 2 ...
>
> Err... Any idea what kind of operations are causing the sillyrename to
> happen? The locking tests in particular should _never_ have any
> outstanding operations post-ULOCK.

I've reproduced this by running several passes of all of the tests
("./server -a -N10") while oprofile is running. Without oprofile
running this seems to be nearly impossible to reproduce.

When a pass finishes, the RMDIR of the test directory fails because
there are .nfsxxx files left in the directory. These .nfsxxx files are
not eventually removed, they stay after the test fails.

Looking at the network trace, I see the RENAME that creates the files
but no REMOVE is issued for these files. Somehow, the client is
forgetting to remove them. There are plenty of proper RENAME/REMOVE
pairs in the trace, so maybe this is a race condition.

I found the RENAMEs in the network trace for all the remaining .nfsxxx
files. The names are:

op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename,
holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp

These look like files created during the special tests.

2010-06-22 19:17:19

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY

On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote:
> It looks like the connectathon tests race with the removal of deleted
> files. The actual lock test is successful, but when the scripts attempt
> to reset the test directory for another pass, the RMDIR fails because
> the directory is full of ".nfsxxx" files.
>
> Seems like RMDIR should wait for those silly deletes before trying to
> remove the parent directory.
>
> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens
> nearly every time.
>
>
> Test #15 - Test 2nd open and I/O after lock and close.
> Parent: Second open succeeded.
> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED.
> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED.
> Parent: Closed testfile.
> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
> Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
> Parent: 15.2 - COMPARE [ 0, b] PASSED.
>
> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).
>
> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
> Congratulations, you passed the locking tests!
> ... Pass 2 ...

Err... Any idea what kind of operations are causing the sillyrename to
happen? The locking tests in particular should _never_ have any
outstanding operations post-ULOCK.



2010-06-23 18:07:18

by Staubach_Peter

[permalink] [raw]
Subject: RE: Connectathon locking test fails over NFSv3 with EBUSY

UGVyaGFwcyB0aGUgb3Byb2ZpbGUgc3VwcG9ydCBpcyByZXRhaW5pbmcgYW4gYWRkaXRpb25hbCBy
ZWZlcmVuY2UgdG8gdGhlIGluLWNvcmUNCmlub2RlIHdoaWNoIGlzIGNhdXNpbmcgdGhlIC5uZnNY
WFhYIGZpbGVzIHRvIGdldCBjcmVhdGVkIGFuZCBpcyBhbHNvIGRlbGF5aW5nIHRoZWlyDQpyZW1v
dmFsPw0KDQoJCXBzDQoNCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCkZyb206IGxpbnV4
LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmcgW21haWx0bzpsaW51eC1uZnMtb3duZXJAdmdlci5r
ZXJuZWwub3JnXSBPbiBCZWhhbGYgT2YgQ2h1Y2sgTGV2ZXINClNlbnQ6IFdlZG5lc2RheSwgSnVu
ZSAyMywgMjAxMCAxOjUxIFBNDQpUbzogVHJvbmQgTXlrbGVidXN0DQpDYzogTkZTdjMgbGlzdA0K
U3ViamVjdDogUmU6IENvbm5lY3RhdGhvbiBsb2NraW5nIHRlc3QgZmFpbHMgb3ZlciBORlN2MyB3
aXRoIEVCVVNZDQoNCk9uIDA2LzIyLzEwIDAzOjE3IFBNLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IE9uIFR1ZSwgMjAxMC0wNi0yMiBhdCAxNTowMyAtMDQwMCwgQ2h1Y2sgTGV2ZXIgd3JvdGU6
DQo+PiBJdCBsb29rcyBsaWtlIHRoZSBjb25uZWN0YXRob24gdGVzdHMgcmFjZSB3aXRoIHRoZSBy
ZW1vdmFsIG9mIGRlbGV0ZWQNCj4+IGZpbGVzLiAgVGhlIGFjdHVhbCBsb2NrIHRlc3QgaXMgc3Vj
Y2Vzc2Z1bCwgYnV0IHdoZW4gdGhlIHNjcmlwdHMgYXR0ZW1wdA0KPj4gdG8gcmVzZXQgdGhlIHRl
c3QgZGlyZWN0b3J5IGZvciBhbm90aGVyIHBhc3MsIHRoZSBSTURJUiBmYWlscyBiZWNhdXNlDQo+
PiB0aGUgZGlyZWN0b3J5IGlzIGZ1bGwgb2YgIi5uZnN4eHgiIGZpbGVzLg0KPj4NCj4+IFNlZW1z
IGxpa2UgUk1ESVIgc2hvdWxkIHdhaXQgZm9yIHRob3NlIHNpbGx5IGRlbGV0ZXMgYmVmb3JlIHRy
eWluZyB0bw0KPj4gcmVtb3ZlIHRoZSBwYXJlbnQgZGlyZWN0b3J5Lg0KPj4NCj4+IEkndmUgc2Vl
biB0aGlzIHdpdGggYm90aCAyLjYuMzQgYW5kIDIuNi4zNS1yYzMgY2xpZW50cywgYW5kIGl0IGhh
cHBlbnMNCj4+IG5lYXJseSBldmVyeSB0aW1lLg0KPj4NCj4+DQo+PiBUZXN0ICMxNSAtIFRlc3Qg
Mm5kIG9wZW4gYW5kIEkvTyBhZnRlciBsb2NrIGFuZCBjbG9zZS4NCj4+IAlQYXJlbnQ6IFNlY29u
ZCBvcGVuIHN1Y2NlZWRlZC4NCj4+IAlQYXJlbnQ6IDE1LjAgIC0gRl9MT0NLICBbICAgICAgICAg
ICAgICAgMCwgICAgICAgICAgRU5ESU5HXSBQQVNTRUQuDQo+PiAJUGFyZW50OiAxNS4xICAtIEZf
VUxPQ0sgWyAgICAgICAgICAgICAgIDAsICAgICAgICAgIEVORElOR10gUEFTU0VELg0KPj4gCVBh
cmVudDogQ2xvc2VkIHRlc3RmaWxlLg0KPj4gCVBhcmVudDogV3JvdGUgJ2FiY2RlZmdoaWonIHRv
IHRlc3RmaWxlIFsgMCwgMTEgXS4NCj4+IAlQYXJlbnQ6IFJlYWQgJ2FiY2RlZmdoaWonIGZyb20g
dGVzdGZpbGUgWyAwLCAxMSBdLg0KPj4gCVBhcmVudDogMTUuMiAgLSBDT01QQVJFIFsgICAgICAg
ICAgICAgICAwLCAgICAgICAgICAgICAgIGJdIFBBU1NFRC4NCj4+DQo+PiAqKiBQQVJFTlQgcGFz
cyAxIHJlc3VsdHM6IDQ5LzQ5IHBhc3MsIDEvMSB3YXJuLCAwLzAgZmFpbCAocGFzcy90b3RhbCku
DQo+Pg0KPj4gKiogIENISUxEIHBhc3MgMSByZXN1bHRzOiA2NC82NCBwYXNzLCAwLzAgd2Fybiwg
MC8wIGZhaWwgKHBhc3MvdG90YWwpLg0KPj4gQ29uZ3JhdHVsYXRpb25zLCB5b3UgcGFzc2VkIHRo
ZSBsb2NraW5nIHRlc3RzIQ0KPj4gLi4uIFBhc3MgMiAuLi4NCj4NCj4gRXJyLi4uIEFueSBpZGVh
IHdoYXQga2luZCBvZiBvcGVyYXRpb25zIGFyZSBjYXVzaW5nIHRoZSBzaWxseXJlbmFtZSB0bw0K
PiBoYXBwZW4/IFRoZSBsb2NraW5nIHRlc3RzIGluIHBhcnRpY3VsYXIgc2hvdWxkIF9uZXZlcl8g
aGF2ZSBhbnkNCj4gb3V0c3RhbmRpbmcgb3BlcmF0aW9ucyBwb3N0LVVMT0NLLg0KDQpJJ3ZlIHJl
cHJvZHVjZWQgdGhpcyBieSBydW5uaW5nIHNldmVyYWwgcGFzc2VzIG9mIGFsbCBvZiB0aGUgdGVz
dHMgDQooIi4vc2VydmVyIC1hIC1OMTAiKSB3aGlsZSBvcHJvZmlsZSBpcyBydW5uaW5nLiAgV2l0
aG91dCBvcHJvZmlsZSANCnJ1bm5pbmcgdGhpcyBzZWVtcyB0byBiZSBuZWFybHkgaW1wb3NzaWJs
ZSB0byByZXByb2R1Y2UuDQoNCldoZW4gYSBwYXNzIGZpbmlzaGVzLCB0aGUgUk1ESVIgb2YgdGhl
IHRlc3QgZGlyZWN0b3J5IGZhaWxzIGJlY2F1c2UgDQp0aGVyZSBhcmUgLm5mc3h4eCBmaWxlcyBs
ZWZ0IGluIHRoZSBkaXJlY3RvcnkuICBUaGVzZSAubmZzeHh4IGZpbGVzIGFyZSANCm5vdCBldmVu
dHVhbGx5IHJlbW92ZWQsIHRoZXkgc3RheSBhZnRlciB0aGUgdGVzdCBmYWlscy4NCg0KTG9va2lu
ZyBhdCB0aGUgbmV0d29yayB0cmFjZSwgSSBzZWUgdGhlIFJFTkFNRSB0aGF0IGNyZWF0ZXMgdGhl
IGZpbGVzIA0KYnV0IG5vIFJFTU9WRSBpcyBpc3N1ZWQgZm9yIHRoZXNlIGZpbGVzLiAgU29tZWhv
dywgdGhlIGNsaWVudCBpcyANCmZvcmdldHRpbmcgdG8gcmVtb3ZlIHRoZW0uICBUaGVyZSBhcmUg
cGxlbnR5IG9mIHByb3BlciBSRU5BTUUvUkVNT1ZFIA0KcGFpcnMgaW4gdGhlIHRyYWNlLCBzbyBt
YXliZSB0aGlzIGlzIGEgcmFjZSBjb25kaXRpb24uDQoNCkkgZm91bmQgdGhlIFJFTkFNRXMgaW4g
dGhlIG5ldHdvcmsgdHJhY2UgZm9yIGFsbCB0aGUgcmVtYWluaW5nIC5uZnN4eHggDQpmaWxlcy4g
IFRoZSBuYW1lcyBhcmU6DQoNCm9wX3VubGssIHN0YXQsIG9wX3Jlbiwgb3BfY2htb2QsIGR1cHJl
cSwgZXhjbHRlc3QsIG5lZ3NlZWssIHJlbmFtZSwgDQpob2xleSwgdHJ1bmNhdGUsIG5mc2lkZW0s
IHJld2luZCwgdGVsbGRpciwgYmlnZmlsZSwgYmlnZmlsZTIsIGZyZWVzcA0KDQpUaGVzZSBsb29r
IGxpa2UgZmlsZXMgY3JlYXRlZCBkdXJpbmcgdGhlIHNwZWNpYWwgdGVzdHMuDQotLQ0KVG8gdW5z
dWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4
LW5mcyIgaW4NCnRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwu
b3JnDQpNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9y
ZG9tby1pbmZvLmh0bWwNCg0K

2010-06-23 18:44:05

by Myklebust, Trond

[permalink] [raw]
Subject: RE: Connectathon locking test fails over NFSv3 with EBUSY

On Wed, 2010-06-23 at 14:06 -0400, [email protected] wrote:
> Perhaps the oprofile support is retaining an additional reference to the in-core
> inode which is causing the .nfsXXXX files to get created and is also delaying their
> removal?

Could the files actually be temporary files that are being created by
oprofile itself? I must admit that I have little experience with running
oprofile...

Cheers
Trond

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Chuck Lever
> Sent: Wednesday, June 23, 2010 1:51 PM
> To: Trond Myklebust
> Cc: NFSv3 list
> Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY
>
> On 06/22/10 03:17 PM, Trond Myklebust wrote:
> > On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote:
> >> It looks like the connectathon tests race with the removal of deleted
> >> files. The actual lock test is successful, but when the scripts attempt
> >> to reset the test directory for another pass, the RMDIR fails because
> >> the directory is full of ".nfsxxx" files.
> >>
> >> Seems like RMDIR should wait for those silly deletes before trying to
> >> remove the parent directory.
> >>
> >> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens
> >> nearly every time.
> >>
> >>
> >> Test #15 - Test 2nd open and I/O after lock and close.
> >> Parent: Second open succeeded.
> >> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED.
> >> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED.
> >> Parent: Closed testfile.
> >> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
> >> Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
> >> Parent: 15.2 - COMPARE [ 0, b] PASSED.
> >>
> >> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).
> >>
> >> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
> >> Congratulations, you passed the locking tests!
> >> ... Pass 2 ...
> >
> > Err... Any idea what kind of operations are causing the sillyrename to
> > happen? The locking tests in particular should _never_ have any
> > outstanding operations post-ULOCK.
>
> I've reproduced this by running several passes of all of the tests
> ("./server -a -N10") while oprofile is running. Without oprofile
> running this seems to be nearly impossible to reproduce.
>
> When a pass finishes, the RMDIR of the test directory fails because
> there are .nfsxxx files left in the directory. These .nfsxxx files are
> not eventually removed, they stay after the test fails.
>
> Looking at the network trace, I see the RENAME that creates the files
> but no REMOVE is issued for these files. Somehow, the client is
> forgetting to remove them. There are plenty of proper RENAME/REMOVE
> pairs in the trace, so maybe this is a race condition.
>
> I found the RENAMEs in the network trace for all the remaining .nfsxxx
> files. The names are:
>
> op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename,
> holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp
>
> These look like files created during the special tests.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>



2010-06-23 19:17:22

by Chuck Lever III

[permalink] [raw]
Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY

On 06/23/10 02:06 PM, [email protected] wrote:
> Perhaps the oprofile support is retaining an additional reference to the in-core
> inode which is causing the .nfsXXXX files to get created and is also delaying their
> removal?

The files do not appear in oprofiled's fd list (in /proc). Killing the
oprofiled process after the test finishes does make those files go away.
Just shutting down the profiler leaves oprofiled, so additionally
killing the daemon appears to be necessary to finish the silly removal
process.

These files are all executables (part of the connectathon suite), but I
don't have the "profile user space binaries" checkbox selected.

> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Chuck Lever
> Sent: Wednesday, June 23, 2010 1:51 PM
> To: Trond Myklebust
> Cc: NFSv3 list
> Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY
>
> On 06/22/10 03:17 PM, Trond Myklebust wrote:
>> On Tue, 2010-06-22 at 15:03 -0400, Chuck Lever wrote:
>>> It looks like the connectathon tests race with the removal of deleted
>>> files. The actual lock test is successful, but when the scripts attempt
>>> to reset the test directory for another pass, the RMDIR fails because
>>> the directory is full of ".nfsxxx" files.
>>>
>>> Seems like RMDIR should wait for those silly deletes before trying to
>>> remove the parent directory.
>>>
>>> I've seen this with both 2.6.34 and 2.6.35-rc3 clients, and it happens
>>> nearly every time.
>>>
>>>
>>> Test #15 - Test 2nd open and I/O after lock and close.
>>> Parent: Second open succeeded.
>>> Parent: 15.0 - F_LOCK [ 0, ENDING] PASSED.
>>> Parent: 15.1 - F_ULOCK [ 0, ENDING] PASSED.
>>> Parent: Closed testfile.
>>> Parent: Wrote 'abcdefghij' to testfile [ 0, 11 ].
>>> Parent: Read 'abcdefghij' from testfile [ 0, 11 ].
>>> Parent: 15.2 - COMPARE [ 0, b] PASSED.
>>>
>>> ** PARENT pass 1 results: 49/49 pass, 1/1 warn, 0/0 fail (pass/total).
>>>
>>> ** CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
>>> Congratulations, you passed the locking tests!
>>> ... Pass 2 ...
>>
>> Err... Any idea what kind of operations are causing the sillyrename to
>> happen? The locking tests in particular should _never_ have any
>> outstanding operations post-ULOCK.
>
> I've reproduced this by running several passes of all of the tests
> ("./server -a -N10") while oprofile is running. Without oprofile
> running this seems to be nearly impossible to reproduce.
>
> When a pass finishes, the RMDIR of the test directory fails because
> there are .nfsxxx files left in the directory. These .nfsxxx files are
> not eventually removed, they stay after the test fails.
>
> Looking at the network trace, I see the RENAME that creates the files
> but no REMOVE is issued for these files. Somehow, the client is
> forgetting to remove them. There are plenty of proper RENAME/REMOVE
> pairs in the trace, so maybe this is a race condition.
>
> I found the RENAMEs in the network trace for all the remaining .nfsxxx
> files. The names are:
>
> op_unlk, stat, op_ren, op_chmod, dupreq, excltest, negseek, rename,
> holey, truncate, nfsidem, rewind, telldir, bigfile, bigfile2, freesp
>
> These look like files created during the special tests.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2010-06-23 19:27:15

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY

On Wed, 2010-06-23 at 15:17 -0400, Chuck Lever wrote:
> On 06/23/10 02:06 PM, [email protected] wrote:
> > Perhaps the oprofile support is retaining an additional reference to the in-core
> > inode which is causing the .nfsXXXX files to get created and is also delaying their
> > removal?
>
> The files do not appear in oprofiled's fd list (in /proc). Killing the
> oprofiled process after the test finishes does make those files go away.
> Just shutting down the profiler leaves oprofiled, so additionally
> killing the daemon appears to be necessary to finish the silly removal
> process.
>
> These files are all executables (part of the connectathon suite), but I
> don't have the "profile user space binaries" checkbox selected.

OK. That makes more sense... Do these files perhaps appear in
the /proc/<pid>/maps and/or /proc/<pid>/smaps pseudofile for oprofiled?

Cheers
Trond

2010-06-23 19:56:14

by Chuck Lever III

[permalink] [raw]
Subject: Re: Connectathon locking test fails over NFSv3 with EBUSY

On 06/23/10 03:26 PM, Trond Myklebust wrote:
> On Wed, 2010-06-23 at 15:17 -0400, Chuck Lever wrote:
>> On 06/23/10 02:06 PM, [email protected] wrote:
>>> Perhaps the oprofile support is retaining an additional reference to the in-core
>>> inode which is causing the .nfsXXXX files to get created and is also delaying their
>>> removal?
>>
>> The files do not appear in oprofiled's fd list (in /proc). Killing the
>> oprofiled process after the test finishes does make those files go away.
>> Just shutting down the profiler leaves oprofiled, so additionally
>> killing the daemon appears to be necessary to finish the silly removal
>> process.
>>
>> These files are all executables (part of the connectathon suite), but I
>> don't have the "profile user space binaries" checkbox selected.
>
> OK. That makes more sense... Do these files perhaps appear in
> the /proc/<pid>/maps and/or /proc/<pid>/smaps pseudofile for oprofiled?

I don't see anything suspicious in those files.