2014-03-05 17:45:27

by Andrew Martin

[permalink] [raw]
Subject: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Hello,

Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g
3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu
12.04 results in processes blocking forever in uninterruptable sleep if they
attempt to access a mountpoint while the NFS server is offline. I would prefer
that NFS simply return an error to the clients after retrying a few times,
however I also cannot have data loss. From the man page, I think these options
will give that effect?
soft,proto=tcp,timeo=10,retrans=3

>From my understanding, this will cause NFS to retry the connection 3 times (once
per second), and then if all 3 are unsuccessful return an error to the
application. Is this correct? Is there a risk of data loss or corruption by
using "soft" in this way? Or is there a better way to approach this?

Thanks,

Andrew Martin


2014-03-06 19:02:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 13:48, Jim Rees <[email protected]> wrote:

> Andrew Martin wrote:
>
>> From: "Jim Rees" <[email protected]>
>> Why would a bunch of blocked apaches cause high load and reboot?
> What I believe happens is the apache child processes go to serve
> these requests and then block in uninterruptable sleep. Thus, there
> are fewer and fewer child processes to handle new incoming requests.
> Eventually, apache would normally kill said children (e.g after a
> child handles a certain number of requests), but it cannot kill them
> because they are in uninterruptable sleep. As more and more incoming
> requests are queued (and fewer and fewer child processes are available
> to serve the requests), the load climbs.
>
> But Neil says the sleeps should be interruptible, despite what the man page
> says.
>
> Trond, as far as you know, should a soft mount be interruptible by SIGINT,
> or should it require a SIGKILL?

The ?TASK_KILLABLE? state is interruptible by any _fatal_ signal. So if the application uses sigaction() to install a handler for SIGINT, then the RPC call will no longer be interruptible by SIGINT.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 17:47:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 11:45, Chuck Lever <[email protected]> wrote:

>
> On Mar 6, 2014, at 11:16 AM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 11:13, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <[email protected]> wrote:
>>>
>>>>
>>>> On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>>>>>
>>>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>>>>>> and not try to write anything to nfs.
>>>>>>>
>>>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>>>>>
>>>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>>>>>
>>>>> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.
>>>>
>>>> Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit?
>>>
>>> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me.
>>>
>>> If anyone else had claimed a testing gap, you would have said ?If that were the case, we would have a blatant read bug? and left it at that. But you had to go one needless and provocative step further.
>>>
>>> Stop bullying me, Trond. I?ve had enough of it.
>>
>> The stop spreading FUD. That is far from professional too.
>
> FUD is a marketing term, and implies I had intent to deceive. Really?
>
> I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read ?IMO? where it is implied.
>
> It is absolutely your right to say that I?m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark.
>
> Face it, you over-reacted. Again. Knock it off.
>

You clearly don?t know what other people are testing with, and you clearly didn?t ask anyone before you started telling users that 'soft' is untested. I happen to know a server vendor for which _all_ internal QA tests are done using the ?soft? mount option on the clients. This is done for practical reasons in order to prevent client hangs if the server should panic. I strongly suspect that other QA departments are testing the ?soft' case too.

Acting as if you are an authoritative source on the subject of testing, when you are not and you know that you are not, does constitute intentional deception, yes. ?and no, I don?t see anything above to indicate that this was an ?opinion? on the subject of what is being tested which is precisely why I called it.

There are good reasons to distrust the ?soft? mount option, but lack of testing is not it. The general lack of application support for handling the resulting EIO errors is, however...

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 19:14:52

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpUcm9uZCwNCg0KSW4gdGhpcyBjYXNlLCBpdCBpc24ndCBmc3luYyBvciBjbG9zZSB0aGF0IGFy
ZSBub3QgZ2V0dGluZyB0aGUgaS9vIGVycm9yLiAgSXQgaXMgdGhlIHdyaXRlKCkuICANCg0KQW5k
IHdlIGNoZWNrIHRoZSByZXR1cm4gdmFsdWUgb2YgZXZlcnkgaS9vIHJlbGF0ZWQgY29tbWFuZC4N
Cg0KV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNlIGJl
Y29tZXMgYWJ5c21hbC4NCg0KUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxseSByZXN1
bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUgd3JpdGUo
KSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdvcmtpbmcg
dGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZlbnR1YWxs
eSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4gICAgDQoNCkFzIEkgbWVudGlvbmVk
IHByZXZpb3VzbHksIGlmIHdlIG1vdW50IHdpdGggc3luYyBvciBkaXJlY3QgaS9vIHR5cGUgb3B0
aW9ucywgd2Ugd2lsbCBnZXQgdGhlIGkvbyBlcnJvciwgYnV0IGZvciBwZXJmb3JtYW5jZSByZWFz
b25zLCB0aGlzIGlzbid0IGFuIG9wdGlvbi4NCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0N
CkZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4N
CkRhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNv
bT4NCkNjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+OyBKaW0gUmVlczxyZWVz
QHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJA
dmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBS
ZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBh
bmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2LCAyMDE0LCBhdCAxNDow
MCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCg0KPiANCj4gRXZl
biB3aXRoIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zLCB5b3Ugd29uJ3QgZ2V0IGkvbyBlcnJvcnMg
YmFjayB0byB0aGUgcmVhZHMvd3JpdGVzLiAgIFRoYXQncyBiZWVuIG91ciBleHBlcmllbmNlIGFu
eXdheS4NCg0KUmVhZCBjYWNoaW5nLCBhbmQgYnVmZmVyZWQgd3JpdGVzIG1lYW4gdGhhdCB0aGUg
SS9PIGVycm9ycyBvZnRlbiBkbyBub3Qgb2NjdXIgZHVyaW5nIHRoZSByZWFkKCkvd3JpdGUoKSBz
eXN0ZW0gY2FsbCBpdHNlbGYuDQoNCldlIGRvIHRyeSB0byBwcm9wYWdhdGUgSS9PIGVycm9ycyBi
YWNrIHRvIHRoZSBhcHBsaWNhdGlvbiBhcyBzb29uIGFzIHRoZSBkbyBvY2N1ciwgYnV0IGlmIHRo
YXQgYXBwbGljYXRpb24gaXNuknQgdXNpbmcgc3luY2hyb25vdXMgSS9PLCBhbmQgaXQgaXNuknQg
Y2hlY2tpbmcgdGhlIHJldHVybiB2YWx1ZXMgb2YgZnN5bmMoKSBvciBjbG9zZSgpLCB0aGVuIHRo
ZXJlIGlzIGxpdHRsZSB0aGUga2VybmVsIGNhbiBkby4uLg0KDQo+IA0KPiBXaXRoIHNvZnQsIHlv
dSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRoYXQgaGFkIGFscmVhZHkgYmVlbiB3
cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0aGUgc3RvcmFnZSkuICAgWW91J2Qg
aGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9vIGlmIGl0IHdhcyB5b3VyIGFwcGxp
YW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2UgbmV2ZXIgY29tZXMgYmFjaywgdGhv
c2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPiANCj4gSW4geW91ciBjYXNlIHRob3Vn
aCwgeW91J3JlIG5vdCB3cml0aW5nLiAgDQo+IA0KPiANCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdl
LS0tLS0NCj4gRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gRGF0
ZTogVGh1LCA2IE1hciAyMDE0IDEwOjQzOjQyIA0KPiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5l
ZHU+DQo+IENjOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRl
PjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2Vy
bmVsLm9yZz4NCj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2Fm
ZWx5IGFsbG93IGludGVycnVwdHMgYW5kDQo+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4g
DQo+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+IEFuZHJldyBNYXJ0aW4g
d3JvdGU6DQo+PiANCj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+PiBH
aXZlbiB0aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9pbmcgdGhpcyBJJ2QgdXNl
DQo+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+IGFuZCBub3QgdHJ5IHRvIHdyaXRlIGFueXRoaW5n
IHRvIG5mcy4NCj4+ICBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9i
bGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+PiAgYXBhY2hlIHdhcyBhdHRlbXB0aW5n
IHRvIGRvIGEgd3JpdGUgb3IgYSByZWFkLCBidXQgaXQgc2VlbXMgdGhhdA0KPj4gIHRjcCxzb2Z0
LGludHINCj4+ICB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4NCj4+
IA0KPj4gSSBoYWQgdGhlIGltcHJlc3Npb24gZnJvbSB5b3VyIG9yaWdpbmFsIG1lc3NhZ2UgdGhh
dCB5b3Ugd2VyZSBub3QgdXNpbmcNCj4+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcgaWYgaXQncyBz
YWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4+IHRoZSAic29m
dCIgb3B0aW9uIHRoZSBhcGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPiBZZXMsIGV2ZW4gd2l0
aCBzb2Z0LCBpdCBnZXRzIHN0dWNrIGZvcmV2ZXIuIEkgaGFkIGJlZW4gdXNpbmcgdGNwLGJnLHNv
ZnQsaW50cg0KPiB3aGVuIHRoZSBwcm9ibGVtIG9jY3VycmVkIChvbiBzZXZlcmFsIG9jYXNzaW9u
cyksIHNvIG15IG9yaWdpbmFsIHF1ZXN0aW9uIHdhcw0KPiBpZiBpdCB3b3VsZCBiZSBzYWZlIHRv
IHVzZSBhIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkgDQo+IHJl
dHVybiBJL08gZXJyb3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBi
bG9ja2luZyBmb3JldmVyIA0KPiAod2hpY2ggY2F1c2VzIHRoZSBoaWdoIGxvYWQgYW5kIGluZXZp
dGFibGUgcmVib290KS4gSXQgc291bmRzIGxpa2UgdGhhdCBpc24ndA0KPiBzYWZlLCBidXQgcGVy
aGFwcyB0aGVyZSBpcyBhbm90aGVyIHdheSB0byByZXNvbHZlIHRoaXMgcHJvYmxlbT8NCj4gLS0N
Cj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2Ny
aWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2
Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2Vy
bmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+IA0KDQpfX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX18NClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy
LCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0KDQo=


2014-03-06 18:35:19

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

> From: "Jim Rees" <[email protected]>
> Why would a bunch of blocked apaches cause high load and reboot?
What I believe happens is the apache child processes go to serve
these requests and then block in uninterruptable sleep. Thus, there
are fewer and fewer child processes to handle new incoming requests.
Eventually, apache would normally kill said children (e.g after a
child handles a certain number of requests), but it cannot kill them
because they are in uninterruptable sleep. As more and more incoming
requests are queued (and fewer and fewer child processes are available
to serve the requests), the load climbs.


2014-03-05 20:11:54

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

I prefer hard,intr which lets you interrupt the hung process.

2014-03-06 19:33:10

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpXZSBkbyBjYWxsIGZzeW5jIGF0IHN5bmNocm9uaXphdGlvbiBwb2ludHMuDQoNClRoZSBwcm9i
bGVtIGlzIHRoZSB3cml0ZSgpIGJsb2NrcyBmb3JldmVyIChvciBmb3IgYW4gZXhjZXB0aW9uYWxs
eSBsb25nIHRpbWUgb24gdGhlIG9yZGVyIG9mIGhvdXJzIGFuZCBkYXlzKSwgZXZlbiB3aXRoIHRp
bWVvIHNldCB0byBzYXkgMjAgYW5kIHJldHJhbnMgc2V0IHRvIDIuICBXZSBzZWUgdGltZW91dCBt
ZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcywgYnV0IHRoZSB3cml0ZSBjb250aW51ZXMgdG8g
cGVuZC4gICBVbnRpbCB3ZSBzdGFydCBkb2luZyByZXBlYXRlZCB1bW91bnQgLWYncy4gIFRoZW4g
aXQgcmV0dXJucyBhbmQgaGFzIGFuIGkvbyBlcnJvci4NCg0KDQotLS0tLU9yaWdpbmFsIE1lc3Nh
Z2UtLS0tLQ0KRnJvbTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRh
dGEuY29tPg0KRGF0ZTogVGh1LCA2IE1hciAyMDE0IDE0OjI2OjI0IA0KVG86IDxiaGF3bGV5QGx1
bWluZXguY29tPg0KQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBS
ZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5m
cy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NClN1
YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25zIHRvIHNhZmVseSBhbGxvdyBpbnRl
cnJ1cHRzIGFuZCB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQoNCg0KT24gTWFyIDYsIDIwMTQs
IGF0IDE0OjE0LCBCcmlhbiBIYXdsZXkgPGJoYXdsZXlAbHVtaW5leC5jb20+IHdyb3RlOg0KDQo+
IA0KPiBUcm9uZCwNCj4gDQo+IEluIHRoaXMgY2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2Ug
dGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkvbyBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLiAg
DQoNCk15IHBvaW50IGlzIHRoYXQgd3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRvIHJldHVy
biBhbiBlcnJvciBpbiB0aGUgY2FzZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5hdmFpbGFi
bGUuIFVubGVzcyB5b3UgdXNlIE9fU1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4gdGhlIGtl
cm5lbCBpcyBlbnRpdGxlZCBhbmQgaW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBkYXRhIGlu
IGl0cyBwYWdlIGNhY2hlIHVudGlsIHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4gVGhlIHJl
dHVybiB2YWx1ZSBvZiB0aGF0IGZzeW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3aGV0aGVy
IG9yIG5vdCB5b3VyIGRhdGEgaGFzIHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0KDQo+IEFu
ZCB3ZSBjaGVjayB0aGUgcmV0dXJuIHZhbHVlIG9mIGV2ZXJ5IGkvbyByZWxhdGVkIGNvbW1hbmQu
DQoNCj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNl
IGJlY29tZXMgYWJ5c21hbC4NCj4gDQo+IFJlcGVhdGVkIHVtb3VudCAtZiBkb2VzIGV2ZW50dWFs
bHkgcmVzdWx0IGluIHRoZSBpL28gZXJyb3IgZ2V0dGluZyBwcm9wYWdhdGVkIGJhY2sgdG8gdGhl
IHdyaXRlKCkgY2FsbC4gICBJIHN1c3BlY3QgdGhlIHJlcGVhdGVkIHVtb3VudCAtZidzIGFyZSB3
b3JraW5nIHRoZWlyIHdheSB0aHJvdWdoIGJsb2NrcyBpbiB0aGUgY2FjaGUvcXVldWUgYW5kIGV2
ZW50dWFsbHkgd2UgZ2V0IGJhY2sgdG8gdGhlIGJsb2NrZWQgd3JpdGUuICAgIA0KPiANCj4gQXMg
SSBtZW50aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBp
L28gdHlwZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZv
cm1hbmNlIHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KDQpTdXJlLCBidXQgaW4gdGhh
dCBjYXNlIHlvdSBkbyBuZWVkIHRvIGNhbGwgZnN5bmMoKSBiZWZvcmUgdGhlIGFwcGxpY2F0aW9u
IGV4aXRzLiBOb3RoaW5nIGVsc2UgY2FuIGd1YXJhbnRlZSBkYXRhIHN0YWJpbGl0eSwgYW5kIHRo
YXSScyB0cnVlIGZvciBhbGwgc3RvcmFnZS4NCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0t
LQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5j
b20+DQo+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANCj4gVG86IDxiaGF3bGV5QGx1
bWluZXguY29tPg0KPiBDYzogQW5kcmV3IE1hcnRpbjxhbWFydGluQHhlcy1pbmMuY29tPjsgSmlt
IFJlZXM8cmVlc0B1bWljaC5lZHU+OyBCcm93biBOZWlsPG5laWxiQHN1c2UuZGU+OyA8bGludXgt
bmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0K
PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cg
aW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KPiANCj4gDQo+IE9uIE1h
ciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3
cm90ZToNCj4gDQo+PiANCj4+IEV2ZW4gd2l0aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91
IHdvbid0IGdldCBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3Mg
YmVlbiBvdXIgZXhwZXJpZW5jZSBhbnl3YXkuDQo+IA0KPiBSZWFkIGNhY2hpbmcsIGFuZCBidWZm
ZXJlZCB3cml0ZXMgbWVhbiB0aGF0IHRoZSBJL08gZXJyb3JzIG9mdGVuIGRvIG5vdCBvY2N1ciBk
dXJpbmcgdGhlIHJlYWQoKS93cml0ZSgpIHN5c3RlbSBjYWxsIGl0c2VsZi4NCj4gDQo+IFdlIGRv
IHRyeSB0byBwcm9wYWdhdGUgSS9PIGVycm9ycyBiYWNrIHRvIHRoZSBhcHBsaWNhdGlvbiBhcyBz
b29uIGFzIHRoZSBkbyBvY2N1ciwgYnV0IGlmIHRoYXQgYXBwbGljYXRpb24gaXNuknQgdXNpbmcg
c3luY2hyb25vdXMgSS9PLCBhbmQgaXQgaXNuknQgY2hlY2tpbmcgdGhlIHJldHVybiB2YWx1ZXMg
b2YgZnN5bmMoKSBvciBjbG9zZSgpLCB0aGVuIHRoZXJlIGlzIGxpdHRsZSB0aGUga2VybmVsIGNh
biBkby4uLg0KPiANCj4+IA0KPj4gV2l0aCBzb2Z0LCB5b3UgbWF5IGVuZCB1cCB3aXRoIGxvc3Qg
ZGF0YSAoZGF0YSB0aGF0IGhhZCBhbHJlYWR5IGJlZW4gd3JpdHRlbiB0byB0aGUgY2FjaGUgYnV0
IG5vdCB5ZXQgdG8gdGhlIHN0b3JhZ2UpLiAgIFlvdSdkIGhhdmUgdGhhdCBzYW1lIGlzc3VlIHdp
dGggJ2hhcmQnIHRvbyBpZiBpdCB3YXMgeW91ciBhcHBsaWFuY2UgdGhhdCBmYWlsZWQuICBJZiB0
aGUgYXBwbGlhbmNlIG5ldmVyIGNvbWVzIGJhY2ssIHRob3NlIGJsb2NrcyBjYW4gbmV2ZXIgYmUg
d3JpdHRlbi4NCj4+IA0KPj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5n
LiAgDQo+PiANCj4+IA0KPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4+IEZyb206IEFu
ZHJldyBNYXJ0aW4gPGFtYXJ0aW5AeGVzLWluYy5jb20+DQo+PiBEYXRlOiBUaHUsIDYgTWFyIDIw
MTQgMTA6NDM6NDIgDQo+PiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+DQo+PiBDYzogPGJo
YXdsZXlAbHVtaW5leC5jb20+OyBOZWlsQnJvd248bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMt
b3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQo+PiBT
dWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50
ZXJydXB0cyBhbmQNCj4+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4+IA0KPj4+IEZyb206
ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+IEFuZHJldyBNYXJ0aW4gd3JvdGU6DQo+
Pj4gDQo+Pj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+PiBHaXZlbiB0
aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9pbmcgdGhpcyBJJ2QgdXNlDQo+Pj4+
IHJvLHNvZnQsaW50cix0Y3ANCj4+Pj4gYW5kIG5vdCB0cnkgdG8gd3JpdGUgYW55dGhpbmcgdG8g
bmZzLg0KPj4+IEkgd2FzIHVzaW5nIHRjcCxiZyxzb2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0g
b2NjdXJyZWQuIEkgZG8gbm90IGtub3cgaWYNCj4+PiBhcGFjaGUgd2FzIGF0dGVtcHRpbmcgdG8g
ZG8gYSB3cml0ZSBvciBhIHJlYWQsIGJ1dCBpdCBzZWVtcyB0aGF0DQo+Pj4gdGNwLHNvZnQsaW50
cg0KPj4+IHdhcyBub3Qgc3VmZmljaWVudCB0byBwcmV2ZW50IHRoZSBwcm9ibGVtLg0KPj4+IA0K
Pj4+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNzYWdlIHRoYXQg
eW91IHdlcmUgbm90IHVzaW5nDQo+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNh
ZmUgdG8gdXNlIGl0LiBBcmUgeW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+IHRoZSAic29m
dCIgb3B0aW9uIHRoZSBhcGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPj4gWWVzLCBldmVuIHdp
dGggc29mdCwgaXQgZ2V0cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVzaW5nIHRjcCxiZyxz
b2Z0LGludHINCj4+IHdoZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Np
b25zKSwgc28gbXkgb3JpZ2luYWwgcXVlc3Rpb24gd2FzDQo+PiBpZiBpdCB3b3VsZCBiZSBzYWZl
IHRvIHVzZSBhIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkgDQo+
PiByZXR1cm4gSS9PIGVycm9ycyBxdWlja2x5IHRvIHRoZSBhcHBsaWNhdGlvbiwgcmF0aGVyIHRo
YW4gYmxvY2tpbmcgZm9yZXZlciANCj4+ICh3aGljaCBjYXVzZXMgdGhlIGhpZ2ggbG9hZCBhbmQg
aW5ldml0YWJsZSByZWJvb3QpLiBJdCBzb3VuZHMgbGlrZSB0aGF0IGlzbid0DQo+PiBzYWZlLCBi
dXQgcGVyaGFwcyB0aGVyZSBpcyBhbm90aGVyIHdheSB0byByZXNvbHZlIHRoaXMgcHJvYmxlbT8N
Cj4+IC0tDQo+PiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAi
dW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KPj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h
am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6
Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPj4gDQo+IA0KPiBfX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gVHJvbmQgTXlrbGVidXN0DQo+IExpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4gdHJvbmQubXlrbGVidXN0QHByaW1h
cnlkYXRhLmNvbQ0KPiANCg0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpUcm9u
ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRy
b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg0K


2014-03-18 22:28:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 18, 2014, at 17:50, Andrew Martin <[email protected]> wrote:

> ----- Original Message -----
>> From: "Trond Myklebust" <[email protected]>
>> To: "Andrew Martin" <[email protected]>
>> Cc: "Jim Rees" <[email protected]>, [email protected], "Brown Neil" <[email protected]>, [email protected],
>> [email protected]
>> Sent: Thursday, March 6, 2014 3:01:03 PM
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>
>>
>
> Trond,
>
> This problem has reoccurred, and I have captured the debug output that you requested:
>
> echo 0 >/proc/sys/sunrpc/rpc_debug:
> http://pastebin.com/9juDs2TW
>
> echo w > /proc/sysrq-trigger ; dmesg:
> http://pastebin.com/1vDx9bNf
>
> netstat -tn:
> http://pastebin.com/mjxqjmuL
>
> One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint"
> repeatedly to attempt to send SIGKILL back up to the application. This always
> returned "Device or resource busy" and I was unable to unmount the filesystem
> until I used "mount -l".
>
> I was able to kill -9 all but two of the processes that were blocking in
> uninterruptable sleep. Note that I was able to get lsof output on these
> processes this time, and they all appeared to be blocking on access to a
> single file on the nfs share. If I tried to cat said file from this client,
> my terminal would block:
> open("/path/to/file", O_RDONLY) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0
> mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000
> read(3,
>
> However, I could cat the file just fine from another nfs client. Does this
> additional information shed any light on the source of this problem?
>

Ah? So this machine is acting both as a NFSv3 client and a NFSv4 server?

? [1140235.544551] SysRq : Show Blocked State
? [1140235.547126] task PC stack pid father
? [1140235.547145] rpciod/0 D 0000000000000001 0 833 2 0x00000000
? [1140235.547150] ffff8802812a3c20 0000000000000046 0000000000015e00 0000000000015e00
? [1140235.547155] ffff880297251ad0 ffff8802812a3fd8 0000000000015e00 ffff880297251700
? [1140235.547159] 0000000000015e00 ffff8802812a3fd8 0000000000015e00 ffff880297251ad0
? [1140235.547164] Call Trace:
? [1140235.547175] [<ffffffff8156a1a5>] schedule_timeout+0x195/0x300
? [1140235.547182] [<ffffffff81078130>] ? process_timeout+0x0/0x10
? [1140235.547197] [<ffffffffa009ef52>] rpc_shutdown_client+0xc2/0x100 [sunrpc]
? [1140235.547203] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
? [1140235.547216] [<ffffffffa01aa62c>] put_nfs4_client+0x4c/0xb0 [nfsd]
? [1140235.547227] [<ffffffffa01ae669>] nfsd4_cb_probe_done+0x29/0x60 [nfsd]
? [1140235.547238] [<ffffffffa00a5d0c>] rpc_exit_task+0x2c/0x60 [sunrpc]
? [1140235.547250] [<ffffffffa00a64e6>] __rpc_execute+0x66/0x2a0 [sunrpc]
? [1140235.547261] [<ffffffffa00a6750>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
? [1140235.547272] [<ffffffffa00a6765>] rpc_async_schedule+0x15/0x20 [sunrpc]
? [1140235.547276] [<ffffffff81081ba7>] run_workqueue+0xc7/0x1a0
? [1140235.547279] [<ffffffff81081d23>] worker_thread+0xa3/0x110
? [1140235.547284] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
? [1140235.547287] [<ffffffff81081c80>] ? worker_thread+0x0/0x110
? [1140235.547291] [<ffffffff810863d6>] kthread+0x96/0xa0
? [1140235.547295] [<ffffffff810141aa>] child_rip+0xa/0x20
? [1140235.547299] [<ffffffff81086340>] ? kthread+0x0/0xa0
? [1140235.547302] [<ffffffff810141a0>] ? child_rip+0x0/0x20

the above looks bad. The rpciod thread is sleeping, waiting for the rpc client to terminate, and the only task running on that rpc client, according to your rpc_debug output is the above CB_NULL probe. Deadlock...

Bruce, it looks like the above should have been fixed in Linux 2.6.35 with commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is that correct?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 15:59:47

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:

>
> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>
>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>> and not try to write anything to nfs.
>>
>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>
> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.

Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-03-06 19:06:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:

>
> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.

Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.

We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...

>
> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>
> In your case though, you're not writing.
>
>
> -----Original Message-----
> From: Andrew Martin <[email protected]>
> Date: Thu, 6 Mar 2014 10:43:42
> To: Jim Rees<[email protected]>
> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and
> timeouts on newer kernels
>
>> From: "Jim Rees" <[email protected]>
>> Andrew Martin wrote:
>>
>>> From: "Jim Rees" <[email protected]>
>>> Given this is apache, I think if I were doing this I'd use
>>> ro,soft,intr,tcp
>>> and not try to write anything to nfs.
>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>> apache was attempting to do a write or a read, but it seems that
>> tcp,soft,intr
>> was not sufficient to prevent the problem.
>>
>> I had the impression from your original message that you were not using
>> "soft" and were asking if it's safe to use it. Are you saying that even with
>> the "soft" option the apache gets stuck forever?
> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
> when the problem occurred (on several ocassions), so my original question was
> if it would be safe to use a small timeo and retrans values to hopefully
> return I/O errors quickly to the application, rather than blocking forever
> (which causes the high load and inevitable reboot). It sounds like that isn't
> safe, but perhaps there is another way to resolve this problem?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 19:46:50

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

> From: "Trond Myklebust" <[email protected]>
> On Mar 6, 2014, at 13:35, Andrew Martin <[email protected]> wrote:
>
> >> From: "Jim Rees" <[email protected]>
> >> Why would a bunch of blocked apaches cause high load and reboot?
> > What I believe happens is the apache child processes go to serve
> > these requests and then block in uninterruptable sleep. Thus, there
> > are fewer and fewer child processes to handle new incoming requests.
> > Eventually, apache would normally kill said children (e.g after a
> > child handles a certain number of requests), but it cannot kill them
> > because they are in uninterruptable sleep. As more and more incoming
> > requests are queued (and fewer and fewer child processes are available
> > to serve the requests), the load climbs.
>
> Does ‘top’ support this theory? Presumably you should see a handful of
> non-sleeping apache threads dominating the load when it happens.
Yes, it looks like the root apache process is still running:
root 1773 0.0 0.1 244176 16588 ? Ss Feb18 0:42 /usr/sbin/apache2 -k start

All of the others, the children (running as the www-data user), are marked as D.

> Why is the server becoming ‘unavailable’ in the first place? Are you taking
> it down?
I do not know the answer to this. A single NFS server has an export that is
mounted on multiple servers, including this web server. The web server is
running Ubuntu 10.04 LTS 2.6.32-57 with nfs-common 1.2.0. Intermittently, the
NFS mountpoint will become inaccessible on this web server; processes that
attempt to access it will block in uninterruptable sleep. While this is
occurring, the NFS export is still accessible normally from other clients,
so it appears to be related to this particular machine (probably since it is
the last machine running Ubuntu 10.04 and not 12.04). I do not know if this
is a bug in 2.6.32 or another package on the system, but at this time I
cannot upgrade it to 12.04, so I need to find a solution on 10.04.

I attempted to get a backtrace from one of the uninterruptable apache processes:
echo w > /proc/sysrq-trigger

Here's one example:
[1227348.003904] apache2 D 0000000000000000 0 10175 1773 0x00000004
[1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 0000000000015e00
[1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 ffff8801d88f0000
[1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 ffff8801d88f03d0
[1227348.003912] Call Trace:
[1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 [sunrpc]
[1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90
[1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
[1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90
[1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40
[1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc]
[1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc]
[1227348.003949] [<ffffffffa009eb2a>] rpc_run_task+0x3a/0x90 [sunrpc]
[1227348.003953] [<ffffffffa009ec82>] rpc_call_sync+0x42/0x70 [sunrpc]
[1227348.003959] [<ffffffffa013b33b>] T.976+0x4b/0x70 [nfs]
[1227348.003965] [<ffffffffa013bd75>] nfs3_proc_access+0xd5/0x1a0 [nfs]
[1227348.003967] [<ffffffff810fea8f>] ? free_hot_page+0x2f/0x60
[1227348.003969] [<ffffffff8156bd6e>] ? _spin_lock+0xe/0x20
[1227348.003971] [<ffffffff8115b626>] ? dput+0xd6/0x1a0
[1227348.003973] [<ffffffff8115254f>] ? __follow_mount+0x6f/0xb0
[1227348.003978] [<ffffffffa00a7fd4>] ? rpcauth_lookup_credcache+0x1a4/0x270 [sunrpc]
[1227348.003983] [<ffffffffa0125817>] nfs_do_access+0x97/0xf0 [nfs]
[1227348.003989] [<ffffffffa00a87f5>] ? generic_lookup_cred+0x15/0x20 [sunrpc]
[1227348.003994] [<ffffffffa00a7910>] ? rpcauth_lookupcred+0x70/0xc0 [sunrpc]
[1227348.003996] [<ffffffff8115254f>] ? __follow_mount+0x6f/0xb0
[1227348.004001] [<ffffffffa0125915>] nfs_permission+0xa5/0x1e0 [nfs]
[1227348.004003] [<ffffffff81153989>] __link_path_walk+0x99/0xf80
[1227348.004005] [<ffffffff81154aea>] path_walk+0x6a/0xe0
[1227348.004007] [<ffffffff81154cbb>] do_path_lookup+0x5b/0xa0
[1227348.004009] [<ffffffff81148e3a>] ? get_empty_filp+0xaa/0x180
[1227348.004011] [<ffffffff81155c63>] do_filp_open+0x103/0xba0
[1227348.004013] [<ffffffff8156bd6e>] ? _spin_lock+0xe/0x20
[1227348.004015] [<ffffffff812b8055>] ? _atomic_dec_and_lock+0x55/0x80
[1227348.004016] [<ffffffff811618ea>] ? alloc_fd+0x10a/0x150
[1227348.004018] [<ffffffff811454e9>] do_sys_open+0x69/0x170
[1227348.004020] [<ffffffff81145630>] sys_open+0x20/0x30
[1227348.004022] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b

2014-03-06 05:37:29

by NeilBrown

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On Wed, 5 Mar 2014 23:03:43 -0600 (CST) Andrew Martin <[email protected]>
wrote:

> ----- Original Message -----
> > From: "NeilBrown" <[email protected]>
> > To: "Andrew Martin" <[email protected]>
> > Cc: [email protected]
> > Sent: Wednesday, March 5, 2014 9:50:42 PM
> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
> >
> > On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <[email protected]>
> > wrote:
> >
> > > Hello,
> > >
> > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels
> > > (e.g
> > > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu
> > > 12.04 results in processes blocking forever in uninterruptable sleep if
> > > they
> > > attempt to access a mountpoint while the NFS server is offline. I would
> > > prefer
> > > that NFS simply return an error to the clients after retrying a few times,
> > > however I also cannot have data loss. From the man page, I think these
> > > options
> > > will give that effect?
> > > soft,proto=tcp,timeo=10,retrans=3
> > >
> > > >From my understanding, this will cause NFS to retry the connection 3 times
> > > >(once
> > > per second), and then if all 3 are unsuccessful return an error to the
> > > application. Is this correct? Is there a risk of data loss or corruption by
> > > using "soft" in this way? Or is there a better way to approach this?
> >
> > I think your best bet is to use an auto-mounter so that the filesystem gets
> > unmounted if the server isn't available.
> Would this still succeed in unmounting the filesystem if there are already
> processes requesting files from it (and blocking in uninterruptable sleep)?

The kernel would allow a 'lazy' unmount in this case. I don't know if any
automounter would try a lazy unmount though - I suspect not.

A long time ago I used "amd" which would create syslinks to a separate tree
where the filesystems were mounted. I'm pretty sure that when a server went
away the symlink would disappear even if the unmount failed.
So while any processes accessing the filesystem would block, new processes
would not be able to find the filesystem and so would not block.


>
> > "soft" always implies the risk of data loss. "Nulls Frequently Substituted"
> > as it was described to very many years ago.
> >
> > Possibly it would be good to have something between 'hard' and 'soft' for
> > cases like yours (you aren't the first to ask).
> >
> > From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm
> >
> > BSDI and OSF /1 also have a spongy option that is similar to hard , except
> > that the stat, lookup, fsstat, readlink, and readdir operations behave
> > like a soft MOUNT .
> >
> > Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed
> > experiment and there are good reasons not to want it.
>
> The problem that sparked this question is a webserver where apache can serve
> files from an NFS mount. If the NFS server becomes unavailable, then the apache
> processes block in uninterruptable sleep and drive the load very high, forcing
> a server restart. It would be better for this case if the mount would simply
> return an error to apache, so that it would give up rather than blocking
> forever and taking down the system. Can such behavior be achieved safely?

If you have a monitoring program that notices this high load you can try
umount -f /mount/point

The "-f" should cause outstanding requests to fail. That doesn't stop more
requests being made though so it might not be completely successful.
Possibly running it several times would help.

mount --move /mount/point /somewhere/safe
for i in {1..15}; do umount -f /somewhere/safe; done

might be even better, if you can get "mount --move" to work. It doesn't work
for me, probably the fault of systemd (isn't everything :-)).

NeilBrown



Attachments:
signature.asc (828.00 B)

2014-03-06 03:48:07

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

NeilBrown wrote:

On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <[email protected]> wrote:

> Andrew Martin wrote:
>
> Isn't intr/nointr deprecated (since kernel 2.6.25)?
>
> It isn't so much that it's deprecated as that it's now the default (except
> that only SIGKILL will work).

Not quite correct. Any signal will work providing its behaviour is to kill
the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc
will work providing that aren't caught or ignored by the process.

If that's true, then the man page is wrong and someone should fix it. I'll
work up a patch if someone can confirm the behavior.

2014-03-06 16:30:20

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Andrew Martin wrote:

> From: "Jim Rees" <[email protected]>
> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
> and not try to write anything to nfs.
I was using tcp,bg,soft,intr when this problem occurred. I do not know if
apache was attempting to do a write or a read, but it seems that tcp,soft,intr
was not sufficient to prevent the problem.

I had the impression from your original message that you were not using
"soft" and were asking if it's safe to use it. Are you saying that even with
the "soft" option the apache gets stuck forever?

2014-03-06 19:29:29

by Ric Wheeler

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On 03/06/2014 09:14 PM, Brian Hawley wrote:
> Trond,
>
> In this case, it isn't fsync or close that are not getting the i/o error. It is the write().
>
> And we check the return value of every i/o related command.

Checking write() return status means we wrote to the page cache - you must also
fsync() that file to push it out to the target. Do that when it counts, leaving
data in the page cache until you actually need persistence and your performance
should be reasonable.

Doing it the safe way is not free, you will see a performance hit (less so if
you can do batching, etc).

ric

>
> We aren't using synchronous because the performance becomes abysmal.
>
> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write.
>
> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option.
>
> -----Original Message-----
> From: Trond Myklebust <[email protected]>
> Date: Thu, 6 Mar 2014 14:06:24
> To: <[email protected]>
> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:
>
>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.
> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.
>
> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...
>
>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>>
>> In your case though, you're not writing.
>>
>>
>> -----Original Message-----
>> From: Andrew Martin <[email protected]>
>> Date: Thu, 6 Mar 2014 10:43:42
>> To: Jim Rees<[email protected]>
>> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and
>> timeouts on newer kernels
>>
>>> From: "Jim Rees" <[email protected]>
>>> Andrew Martin wrote:
>>>
>>>> From: "Jim Rees" <[email protected]>
>>>> Given this is apache, I think if I were doing this I'd use
>>>> ro,soft,intr,tcp
>>>> and not try to write anything to nfs.
>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>>> apache was attempting to do a write or a read, but it seems that
>>> tcp,soft,intr
>>> was not sufficient to prevent the problem.
>>>
>>> I had the impression from your original message that you were not using
>>> "soft" and were asking if it's safe to use it. Are you saying that even with
>>> the "soft" option the apache gets stuck forever?
>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
>> when the problem occurred (on several ocassions), so my original question was
>> if it would be safe to use a small timeo and retrans values to hopefully
>> return I/O errors quickly to the application, rather than blocking forever
>> (which causes the high load and inevitable reboot). It sounds like that isn't
>> safe, but perhaps there is another way to resolve this problem?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
> N?????r??y????b?X??ǧv?^?)޺{.n?+????{???"??^n?r???z???h?????&???G???h?(?階?ݢj"???m??????z?ޖ???f???h???~?mml==


2014-03-05 20:54:32

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 5, 2014, at 3:15 PM, Brian Hawley <[email protected]> wrote:

>
> In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient).
>
> The inability to get the i/o errors back to the application has been a major pain for us.
>
> On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way.
>
>
> -----Original Message-----
> From: Andrew Martin <[email protected]>
> Sender: [email protected]
> Date: Wed, 5 Mar 2014 11:45:24
> To: <[email protected]>
> Subject: Optimal NFS mount options to safely allow interrupts and timeouts
> on newer kernels
>
> Hello,
>
> Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g
> 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu
> 12.04 results in processes blocking forever in uninterruptable sleep if they
> attempt to access a mountpoint while the NFS server is offline. I would prefer
> that NFS simply return an error to the clients after retrying a few times,
> however I also cannot have data loss. From the man page, I think these options
> will give that effect?
> soft,proto=tcp,timeo=10,retrans=3
>
>> From my understanding, this will cause NFS to retry the connection 3 times (once
> per second), and then if all 3 are unsuccessful return an error to the
> application. Is this correct? Is there a risk of data loss or corruption by
> using "soft" in this way? Or is there a better way to approach this?

There is always a silent data corruption risk with ?soft.? Using TCP and a long retransmit timeout mitigates the risk, but it is still there. A one second timeout for TCP is very short, and will almost certainly result in trouble, especially if the server or network are slow.

You should be able to ^C any waiting NFS process. Blocking forever is usually the sign of a bug.

In general, NFS is not especially tolerant of server unavailability. You may want to consider some other distributed file system protocol that is more fault-tolerant, or find ways to ensure your NFS servers are always accessible.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-03-06 19:38:19

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpJIGFncmVlIGNvbXBsZXRlbHkgdGhhdCB0aGUgd3JpdGUoKSByZXR1cm5pbmcgb25seSBtZWFu
cyBpdCdzIGluIHRoZSBwYWdlIGNhY2hlLg0KDQpJIGFncmVlIGNvbXBsZXRlbHkgdGhhdCBmc3lu
YygpIHJlc3VsdCBpcyB0aGUgb25seSB3YXkgdG8ga25vdyB5b3VyIGRhdGEgaXMgc2FmZS4NCg0K
TmVpdGhlciBvZiB0aG9zZSBpcyB3aGF0IEksIG9yIHRoZSBvcmlnaW5hbCBwb3N0ZXIgKGFuZCB3
aGF0IG90aGVyIHBvc3RlcnMgaW4gdGhlIHBhc3QpIG9uIHRoaXMgc3ViamVjdCBhcmUgZGlzcHV0
aW5nIG9yIGNvbmNlcm5lZCBhYm91dC4NCg0KVGhlIGlzc3VlIGlzLCB0aGUgd3JpdGUoKSBjYWxs
IChpbiBteSBjYXNlIC0gcmVhZCgpIGluIHRoZSBvcmlnaW5hbCBwb3N0ZXJzIGNhc2UpIGRvZXMg
Tk9UIHJldHVybi4gICANCg0KV2UgYm90aCBleHBlY3QgdGhhdCBhIHNvZnQgbW91bnRlZCBORlMg
ZmlsZXN5c3RlbSBzaG91bGQgcHJvcGFnYXRlIGkvbyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGlj
YXRpb24gd2hlbiB0aGUgcmV0cmFucy90aW1lbyBmYWlscyAod2l0aG91dCB0aGUgZmlsZXN5c3Rl
bSBiZWluZyBtb3VudGVkIHN5bmMpLiAgIEJ1dCB0aGF0IGRvZXNuJ3QgaGFwcGVuLiAgICBBbmQg
dGh1cyB0aGUgYXBwbGljYXRpb24gYmxvY2tzIGluZGVmaW5pdGVseSAob3IgY2VydGFpbmx5IGxv
bmdlciB0aGFuIHVzZWZ1bCkuICAgDQoNCldoeSByZXBlYXRlZCB1bW91bnQgLWYncyBldmVudHVh
bGx5IGdldCB0aGUgaS9vIGVycm9yIGJhY2sgdG8gdGhlIGNhbGxlciBhbmQgdGh1cyAidW5ibG9j
ayIgdGhlIGFwcGxpY2F0aW9uLCBJJ20gbm90IHN1cmUuICAgQnV0IEknZCBndWVzcyBpdCBoYXMg
c29tZXRoaW5nIHRvIGRvIHdpdGggaGF2aW5nIHRvIGdldCBlbnRyaWVzIHBlbmRpbmcgdG8gYmUg
d3JpdHRlbiBvZmYgdGhlIHF1ZXVlIHVudGlsIGl0IGV2ZW50dWFsbHkgd29ya3MgaXRzIHdheSBi
YWNrIHRvIHRoZSBsYXN0IHdyaXRlKCkgdGhhdCBibG9ja2VkIGIvYyB0aGUgY2FjaGUgd2FzIGZ1
bGwgKG9yIHNvbWV0aGluZyBsaWtlIHRoYXQpLg0KDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0t
LS0tDQpGcm9tOiBSaWMgV2hlZWxlciA8cndoZWVsZXJAcmVkaGF0LmNvbT4NCkRhdGU6IFRodSwg
MDYgTWFyIDIwMTQgMjE6Mjk6MTYgDQpUbzogPGJoYXdsZXlAbHVtaW5leC5jb20+OyBUcm9uZCBN
eWtsZWJ1c3Q8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkNjOiBBbmRyZXcgTWFy
dGluPGFtYXJ0aW5AeGVzLWluYy5jb20+OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3du
IE5laWw8bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsg
PGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91
bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMNCiBvbiBu
ZXdlciBrZXJuZWxzDQoNCk9uIDAzLzA2LzIwMTQgMDk6MTQgUE0sIEJyaWFuIEhhd2xleSB3cm90
ZToNCj4gVHJvbmQsDQo+DQo+IEluIHRoaXMgY2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2Ug
dGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkvbyBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLg0K
Pg0KPiBBbmQgd2UgY2hlY2sgdGhlIHJldHVybiB2YWx1ZSBvZiBldmVyeSBpL28gcmVsYXRlZCBj
b21tYW5kLg0KDQpDaGVja2luZyB3cml0ZSgpIHJldHVybiBzdGF0dXMgbWVhbnMgd2Ugd3JvdGUg
dG8gdGhlIHBhZ2UgY2FjaGUgLSB5b3UgbXVzdCBhbHNvIA0KZnN5bmMoKSB0aGF0IGZpbGUgdG8g
cHVzaCBpdCBvdXQgdG8gdGhlIHRhcmdldC4gIERvIHRoYXQgd2hlbiBpdCBjb3VudHMsIGxlYXZp
bmcgDQpkYXRhIGluIHRoZSBwYWdlIGNhY2hlIHVudGlsIHlvdSBhY3R1YWxseSBuZWVkIHBlcnNp
c3RlbmNlIGFuZCB5b3VyIHBlcmZvcm1hbmNlIA0Kc2hvdWxkIGJlIHJlYXNvbmFibGUuDQoNCkRv
aW5nIGl0IHRoZSBzYWZlIHdheSBpcyBub3QgZnJlZSwgeW91IHdpbGwgc2VlIGEgcGVyZm9ybWFu
Y2UgaGl0IChsZXNzIHNvIGlmIA0KeW91IGNhbiBkbyBiYXRjaGluZywgZXRjKS4NCg0KcmljDQoN
Cj4NCj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNl
IGJlY29tZXMgYWJ5c21hbC4NCj4NCj4gUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxs
eSByZXN1bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUg
d3JpdGUoKSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdv
cmtpbmcgdGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZl
bnR1YWxseSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4NCj4NCj4gQXMgSSBtZW50
aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBpL28gdHlw
ZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZvcm1hbmNl
IHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KPg0KPiAtLS0tLU9yaWdpbmFsIE1lc3Nh
Z2UtLS0tLQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5
ZGF0YS5jb20+DQo+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNA0KPiBUbzogPGJoYXds
ZXlAbHVtaW5leC5jb20+DQo+IENjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+
OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs
aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v
cmc+DQo+IFN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25zIHRvIHNhZmVseSBh
bGxvdyBpbnRlcnJ1cHRzIGFuZCB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+DQo+DQo+IE9u
IE1hciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29t
PiB3cm90ZToNCj4NCj4+IEV2ZW4gd2l0aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91IHdv
bid0IGdldCBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3MgYmVl
biBvdXIgZXhwZXJpZW5jZSBhbnl3YXkuDQo+IFJlYWQgY2FjaGluZywgYW5kIGJ1ZmZlcmVkIHdy
aXRlcyBtZWFuIHRoYXQgdGhlIEkvTyBlcnJvcnMgb2Z0ZW4gZG8gbm90IG9jY3VyIGR1cmluZyB0
aGUgcmVhZCgpL3dyaXRlKCkgc3lzdGVtIGNhbGwgaXRzZWxmLg0KPg0KPiBXZSBkbyB0cnkgdG8g
cHJvcGFnYXRlIEkvTyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGljYXRpb24gYXMgc29vbiBhcyB0
aGUgZG8gb2NjdXIsIGJ1dCBpZiB0aGF0IGFwcGxpY2F0aW9uIGlzbpJ0IHVzaW5nIHN5bmNocm9u
b3VzIEkvTywgYW5kIGl0IGlzbpJ0IGNoZWNraW5nIHRoZSByZXR1cm4gdmFsdWVzIG9mIGZzeW5j
KCkgb3IgY2xvc2UoKSwgdGhlbiB0aGVyZSBpcyBsaXR0bGUgdGhlIGtlcm5lbCBjYW4gZG8uLi4N
Cj4NCj4+IFdpdGggc29mdCwgeW91IG1heSBlbmQgdXAgd2l0aCBsb3N0IGRhdGEgKGRhdGEgdGhh
dCBoYWQgYWxyZWFkeSBiZWVuIHdyaXR0ZW4gdG8gdGhlIGNhY2hlIGJ1dCBub3QgeWV0IHRvIHRo
ZSBzdG9yYWdlKS4gICBZb3UnZCBoYXZlIHRoYXQgc2FtZSBpc3N1ZSB3aXRoICdoYXJkJyB0b28g
aWYgaXQgd2FzIHlvdXIgYXBwbGlhbmNlIHRoYXQgZmFpbGVkLiAgSWYgdGhlIGFwcGxpYW5jZSBu
ZXZlciBjb21lcyBiYWNrLCB0aG9zZSBibG9ja3MgY2FuIG5ldmVyIGJlIHdyaXR0ZW4uDQo+Pg0K
Pj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5nLg0KPj4NCj4+DQo+PiAt
LS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPj4gRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRp
bkB4ZXMtaW5jLmNvbT4NCj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxMDo0Mzo0Mg0KPj4gVG86
IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1Pg0KPj4gQ2M6IDxiaGF3bGV5QGx1bWluZXguY29tPjsg
TmVpbEJyb3duPG5laWxiQHN1c2UuZGU+OyA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9y
Zz47IDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KPj4gU3ViamVjdDogUmU6IE9wdGltYWwg
TkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kDQo+PiB0aW1l
b3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+Pg0KPj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1p
Y2guZWR1Pg0KPj4+IEFuZHJldyBNYXJ0aW4gd3JvdGU6DQo+Pj4NCj4+Pj4gRnJvbTogIkppbSBS
ZWVzIiA8cmVlc0B1bWljaC5lZHU+DQo+Pj4+IEdpdmVuIHRoaXMgaXMgYXBhY2hlLCBJIHRoaW5r
IGlmIEkgd2VyZSBkb2luZyB0aGlzIEknZCB1c2UNCj4+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+
PiBhbmQgbm90IHRyeSB0byB3cml0ZSBhbnl0aGluZyB0byBuZnMuDQo+Pj4gICBJIHdhcyB1c2lu
ZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9ibGVtIG9jY3VycmVkLiBJIGRvIG5vdCBr
bm93IGlmDQo+Pj4gICBhcGFjaGUgd2FzIGF0dGVtcHRpbmcgdG8gZG8gYSB3cml0ZSBvciBhIHJl
YWQsIGJ1dCBpdCBzZWVtcyB0aGF0DQo+Pj4gICB0Y3Asc29mdCxpbnRyDQo+Pj4gICB3YXMgbm90
IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4NCj4+Pg0KPj4+IEkgaGFkIHRoZSBp
bXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNzYWdlIHRoYXQgeW91IHdlcmUgbm90IHVz
aW5nDQo+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNhZmUgdG8gdXNlIGl0LiBB
cmUgeW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+IHRoZSAic29mdCIgb3B0aW9uIHRoZSBh
cGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPj4gWWVzLCBldmVuIHdpdGggc29mdCwgaXQgZ2V0
cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVzaW5nIHRjcCxiZyxzb2Z0LGludHINCj4+IHdo
ZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Npb25zKSwgc28gbXkgb3Jp
Z2luYWwgcXVlc3Rpb24gd2FzDQo+PiBpZiBpdCB3b3VsZCBiZSBzYWZlIHRvIHVzZSBhIHNtYWxs
IHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkNCj4+IHJldHVybiBJL08gZXJy
b3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBibG9ja2luZyBmb3Jl
dmVyDQo+PiAod2hpY2ggY2F1c2VzIHRoZSBoaWdoIGxvYWQgYW5kIGluZXZpdGFibGUgcmVib290
KS4gSXQgc291bmRzIGxpa2UgdGhhdCBpc24ndA0KPj4gc2FmZSwgYnV0IHBlcmhhcHMgdGhlcmUg
aXMgYW5vdGhlciB3YXkgdG8gcmVzb2x2ZSB0aGlzIHByb2JsZW0/DQo+PiAtLQ0KPj4gVG8gdW5z
dWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4
LW5mcyIgaW4NCj4+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu
ZWwub3JnDQo+PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3Jn
L21ham9yZG9tby1pbmZvLmh0bWwNCj4+DQo+IF9fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fXw0KPiBUcm9uZCBNeWtsZWJ1c3QNCj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQ
cmltYXJ5RGF0YQ0KPiB0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo+DQo+IE6Lp7Lm
7HK4m3n66JrYYrJYrLbHp3bYXpYp3rp7Lm7HK4m3pYp7sZ37Ip7YXm6HcqH2pnrLGoHraJmo6K3a
JqL4Hq5Hq53paK4DKK3pmo6K3aJqIp36GrYbbaf/74Hq5Hq53paK4P5mo6K3aJqIp36IbW1sPT0N
Cg0K


2014-03-06 17:44:44

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Why would a bunch of blocked apaches cause high load and reboot?

2014-03-06 19:47:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 14:33, Brian Hawley <[email protected]> wrote:

>
> We do call fsync at synchronization points.
>
> The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error.

How much data are you trying to sync? ?soft? won?t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ?dirty_background_bytes? (and/or ?dirty_bytes?) sysctl so that it starts writeback at an earlier time.

Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs?

> -----Original Message-----
> From: Trond Myklebust <[email protected]>
> Date: Thu, 6 Mar 2014 14:26:24
> To: <[email protected]>
> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 14:14, Brian Hawley <[email protected]> wrote:
>
>>
>> Trond,
>>
>> In this case, it isn't fsync or close that are not getting the i/o error. It is the write().
>
> My point is that write() isn?t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk.
>
>> And we check the return value of every i/o related command.
>
>> We aren't using synchronous because the performance becomes abysmal.
>>
>> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write.
>>
>> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option.
>
> Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that?s true for all storage.
>
>> -----Original Message-----
>> From: Trond Myklebust <[email protected]>
>> Date: Thu, 6 Mar 2014 14:06:24
>> To: <[email protected]>
>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>
>>
>> On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:
>>
>>>
>>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.
>>
>> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.
>>
>> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...
>>
>>>
>>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>>>
>>> In your case though, you're not writing.
>>>
>>>
>>> -----Original Message-----
>>> From: Andrew Martin <[email protected]>
>>> Date: Thu, 6 Mar 2014 10:43:42
>>> To: Jim Rees<[email protected]>
>>> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and
>>> timeouts on newer kernels
>>>
>>>> From: "Jim Rees" <[email protected]>
>>>> Andrew Martin wrote:
>>>>
>>>>> From: "Jim Rees" <[email protected]>
>>>>> Given this is apache, I think if I were doing this I'd use
>>>>> ro,soft,intr,tcp
>>>>> and not try to write anything to nfs.
>>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>>>> apache was attempting to do a write or a read, but it seems that
>>>> tcp,soft,intr
>>>> was not sufficient to prevent the problem.
>>>>
>>>> I had the impression from your original message that you were not using
>>>> "soft" and were asking if it's safe to use it. Are you saying that even with
>>>> the "soft" option the apache gets stuck forever?
>>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
>>> when the problem occurred (on several ocassions), so my original question was
>>> if it would be safe to use a small timeo and retrans values to hopefully
>>> return I/O errors quickly to the application, rather than blocking forever
>>> (which causes the high load and inevitable reboot). It sounds like that isn't
>>> safe, but perhaps there is another way to resolve this problem?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>> _________________________________
>> Trond Myklebust
>> Linux NFS client maintainer, PrimaryData
>> [email protected]
>>
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 15:30:35

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

> From: "Brian Hawley" <[email protected]>
>
> I ended up writing a "manage_mounts" script run by cron that compares
> /proc/mounts and the fstab, used ping, and "timeout" messages in
> /var/log/messages to identify filesystems that aren't responding, repeatedly
> do umount -f to force i/o errors back to the calling applications; and when
> missing mounts (in fstab but not /proc/mounts) but were now pingable,
> attempt to remount them.
>
>
> For me, timeo and retrans are necessary, but not sufficient. The chunking to
> rsize/wsize and caching plays a role in how well i/o errors get relayed back
> to the applications doing the i/o.
>
> You will certainly lose data in these scenario's.
>
> It would be fantastic if somehow the timeo and retrans were sufficient (ie
> when they fail, i/o errors get back to the applications that queued that i/o
> (or even the i/o that cause the application to pend because the rsize/wsize
> or cache was full).
>
> You can eliminate some of that behavior with sync/directio, but performance
> becomes abysmal.
>
> I tried "lazy" it didn't provide the desired effect (they unmounted which
> prevented new i/o's; but existing I/o's never got errors).
This is the problem I am having - I can unmount the filesystem with -l, but
once it is unmounted the existing apache processes are still stuck forever.
Does repeatedly running "umount -f" instead of "umount -l" as you describe
return I/O errors back to existing processes and allow them to stop?


> From: "Jim Rees" <[email protected]>
> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
> and not try to write anything to nfs.
I was using tcp,bg,soft,intr when this problem occurred. I do not know if
apache was attempting to do a write or a read, but it seems that tcp,soft,intr
was not sufficient to prevent the problem.

2014-03-06 12:35:50

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
and not try to write anything to nfs.

2014-03-18 21:50:44

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

----- Original Message -----
> From: "Trond Myklebust" <[email protected]>
> To: "Andrew Martin" <[email protected]>
> Cc: "Jim Rees" <[email protected]>, [email protected], "Brown Neil" <[email protected]>, [email protected],
> [email protected]
> Sent: Thursday, March 6, 2014 3:01:03 PM
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 15:45, Andrew Martin <[email protected]> wrote:
>
> > ----- Original Message -----
> >> From: "Trond Myklebust" <[email protected]>
> >>> I attempted to get a backtrace from one of the uninterruptable apache
> >>> processes:
> >>> echo w > /proc/sysrq-trigger
> >>>
> >>> Here's one example:
> >>> [1227348.003904] apache2 D 0000000000000000 0 10175 1773
> >>> 0x00000004
> >>> [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00
> >>> 0000000000015e00
> >>> [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00
> >>> ffff8801d88f0000
> >>> [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00
> >>> ffff8801d88f03d0
> >>> [1227348.003912] Call Trace:
> >>> [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
> >>> [sunrpc]
> >>> [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40
> >>> [sunrpc]
> >>> [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90
> >>> [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
> >>> [sunrpc]
> >>> [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90
> >>> [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40
> >>> [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc]
> >>> [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc]
> >>
> >> That basically means that the process is hanging in the RPC layer,
> >> somewhere
> >> in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’
> >> user should give us a dump of which state these RPC calls are in. Can you
> >> please try that?
> > Yes I will definitely run that the next time it happens, but since it
> > occurs
> > sporadically (and I have not yet found a way to reproduce it on demand), it
> > could be days before it occurs again. I'll also run "netstat -tn" to check
> > the
> > TCP connections the next time this happens.
>
> If you are comfortable applying patches and compiling your own kernels, then
> you might want to try applying the fix for a certain out-of-socket-buffer
> race that Neil reported, and that I suspect you may be hitting. The patch
> has been sent to the ‘stable kernel’ series, and so should appear soon in
> Debian’s own kernels, but if this is bothering you now, then go for it…
>
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=06ea0bfe6e6043cb56a78935a19f6f8ebc636226
>

Trond,

This problem has reoccurred, and I have captured the debug output that you requested:

echo 0 >/proc/sys/sunrpc/rpc_debug:
http://pastebin.com/9juDs2TW

echo w > /proc/sysrq-trigger ; dmesg:
http://pastebin.com/1vDx9bNf

netstat -tn:
http://pastebin.com/mjxqjmuL

One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint"
repeatedly to attempt to send SIGKILL back up to the application. This always
returned "Device or resource busy" and I was unable to unmount the filesystem
until I used "mount -l".

I was able to kill -9 all but two of the processes that were blocking in
uninterruptable sleep. Note that I was able to get lsof output on these
processes this time, and they all appeared to be blocking on access to a
single file on the nfs share. If I tried to cat said file from this client,
my terminal would block:
open("/path/to/file", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0
mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000
read(3,

However, I could cat the file just fine from another nfs client. Does this
additional information shed any light on the source of this problem?

Thanks,

Andrew






2014-03-06 16:43:56

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

> From: "Jim Rees" <[email protected]>
> Andrew Martin wrote:
>
> > From: "Jim Rees" <[email protected]>
> > Given this is apache, I think if I were doing this I'd use
> > ro,soft,intr,tcp
> > and not try to write anything to nfs.
> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
> apache was attempting to do a write or a read, but it seems that
> tcp,soft,intr
> was not sufficient to prevent the problem.
>
> I had the impression from your original message that you were not using
> "soft" and were asking if it's safe to use it. Are you saying that even with
> the "soft" option the apache gets stuck forever?
Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
when the problem occurred (on several ocassions), so my original question was
if it would be safe to use a small timeo and retrans values to hopefully
return I/O errors quickly to the application, rather than blocking forever
(which causes the high load and inevitable reboot). It sounds like that isn't
safe, but perhaps there is another way to resolve this problem?

2014-03-06 09:37:59

by Ric Wheeler

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On 03/05/2014 10:15 PM, Brian Hawley wrote:
> In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient).
>
> The inability to get the i/o errors back to the application has been a major pain for us.
>
> On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way.

The key to get IO errors promptly is to make sure you use fsync/fdatasync (and
so on) when you hit those points in your application that are where you want to
recover from if things crash, get disconnected, etc.

Those will push out the data from the page cache while your application is still
around which is critical for any potential need to do recovery.

Note that this is not just an issue with NFS, any file system (including local
file systems) normally completes the write request when the IO hits the page
cache. When that page eventually gets sent down to the permanent storage device
(NFS server, local disk, etc), your process is potentially no longer around and
certainly not waiting for IO errors in the original write call :)

To make this even trickier is that the calls like fsync() that persist data have
a substantial performance impact, so you don't want to over-use them. (Try
writing a 1GB file with an fsync() before close and comparing that to writing a
1GB file opened in O_DIRECT|O_SYNC mode for the worst case for example :))

Ric


2014-03-28 22:00:51

by J. Bruce Fields

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On Tue, Mar 18, 2014 at 06:27:57PM -0400, Trond Myklebust wrote:
>
> On Mar 18, 2014, at 17:50, Andrew Martin <[email protected]> wrote:
>
> > ----- Original Message -----
> >> From: "Trond Myklebust" <[email protected]>
> >> To: "Andrew Martin" <[email protected]>
> >> Cc: "Jim Rees" <[email protected]>, [email protected], "Brown Neil" <[email protected]>, [email protected],
> >> [email protected]
> >> Sent: Thursday, March 6, 2014 3:01:03 PM
> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
> >>
> >>
> >
> > Trond,
> >
> > This problem has reoccurred, and I have captured the debug output that you requested:
> >
> > echo 0 >/proc/sys/sunrpc/rpc_debug:
> > http://pastebin.com/9juDs2TW
> >
> > echo w > /proc/sysrq-trigger ; dmesg:
> > http://pastebin.com/1vDx9bNf
> >
> > netstat -tn:
> > http://pastebin.com/mjxqjmuL
> >
> > One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint"
> > repeatedly to attempt to send SIGKILL back up to the application. This always
> > returned "Device or resource busy" and I was unable to unmount the filesystem
> > until I used "mount -l".
> >
> > I was able to kill -9 all but two of the processes that were blocking in
> > uninterruptable sleep. Note that I was able to get lsof output on these
> > processes this time, and they all appeared to be blocking on access to a
> > single file on the nfs share. If I tried to cat said file from this client,
> > my terminal would block:
> > open("/path/to/file", O_RDONLY) = 3
> > fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0
> > mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000
> > read(3,
> >
> > However, I could cat the file just fine from another nfs client. Does this
> > additional information shed any light on the source of this problem?
> >
>
> Ah… So this machine is acting both as a NFSv3 client and a NFSv4 server?
>
> • [1140235.544551] SysRq : Show Blocked State
> • [1140235.547126] task PC stack pid father
> • [1140235.547145] rpciod/0 D 0000000000000001 0 833 2 0x00000000
> • [1140235.547150] ffff8802812a3c20 0000000000000046 0000000000015e00 0000000000015e00
> • [1140235.547155] ffff880297251ad0 ffff8802812a3fd8 0000000000015e00 ffff880297251700
> • [1140235.547159] 0000000000015e00 ffff8802812a3fd8 0000000000015e00 ffff880297251ad0
> • [1140235.547164] Call Trace:
> • [1140235.547175] [<ffffffff8156a1a5>] schedule_timeout+0x195/0x300
> • [1140235.547182] [<ffffffff81078130>] ? process_timeout+0x0/0x10
> • [1140235.547197] [<ffffffffa009ef52>] rpc_shutdown_client+0xc2/0x100 [sunrpc]
> • [1140235.547203] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
> • [1140235.547216] [<ffffffffa01aa62c>] put_nfs4_client+0x4c/0xb0 [nfsd]
> • [1140235.547227] [<ffffffffa01ae669>] nfsd4_cb_probe_done+0x29/0x60 [nfsd]
> • [1140235.547238] [<ffffffffa00a5d0c>] rpc_exit_task+0x2c/0x60 [sunrpc]
> • [1140235.547250] [<ffffffffa00a64e6>] __rpc_execute+0x66/0x2a0 [sunrpc]
> • [1140235.547261] [<ffffffffa00a6750>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
> • [1140235.547272] [<ffffffffa00a6765>] rpc_async_schedule+0x15/0x20 [sunrpc]
> • [1140235.547276] [<ffffffff81081ba7>] run_workqueue+0xc7/0x1a0
> • [1140235.547279] [<ffffffff81081d23>] worker_thread+0xa3/0x110
> • [1140235.547284] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
> • [1140235.547287] [<ffffffff81081c80>] ? worker_thread+0x0/0x110
> • [1140235.547291] [<ffffffff810863d6>] kthread+0x96/0xa0
> • [1140235.547295] [<ffffffff810141aa>] child_rip+0xa/0x20
> • [1140235.547299] [<ffffffff81086340>] ? kthread+0x0/0xa0
> • [1140235.547302] [<ffffffff810141a0>] ? child_rip+0x0/0x20
>
> the above looks bad. The rpciod thread is sleeping, waiting for the rpc client to terminate, and the only task running on that rpc client, according to your rpc_debug output is the above CB_NULL probe. Deadlock...
>
> Bruce, it looks like the above should have been fixed in Linux 2.6.35 with commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is that correct?

Yes, that definitely looks it would explain the bug. And the sysrq
trace shows 2.6.32-57.

Andrew Martin, can you confirm that the problem is no longer
reproduceable on a kernel with that patch applied?

--b.

2014-03-05 21:11:29

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Andrew Martin wrote:

Isn't intr/nointr deprecated (since kernel 2.6.25)?

It isn't so much that it's deprecated as that it's now the default (except
that only SIGKILL will work).

2014-03-06 05:47:24

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpJIGVuZGVkIHVwIHdyaXRpbmcgYSAibWFuYWdlX21vdW50cyIgc2NyaXB0IHJ1biBieSBjcm9u
IHRoYXQgY29tcGFyZXMgL3Byb2MvbW91bnRzIGFuZCB0aGUgZnN0YWIsIHVzZWQgcGluZywgYW5k
ICJ0aW1lb3V0IiBtZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcyB0byBpZGVudGlmeSBmaWxl
c3lzdGVtcyB0aGF0IGFyZW4ndCByZXNwb25kaW5nLCByZXBlYXRlZGx5IGRvIHVtb3VudCAtZiB0
byBmb3JjZSBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIGNhbGxpbmcgYXBwbGljYXRpb25zOyBhbmQg
d2hlbiBtaXNzaW5nIG1vdW50cyAoaW4gZnN0YWIgYnV0IG5vdCAvcHJvYy9tb3VudHMpIGJ1dCB3
ZXJlIG5vdyBwaW5nYWJsZSwgYXR0ZW1wdCB0byByZW1vdW50IHRoZW0uDQoNCg0KRm9yIG1lLCB0
aW1lbyBhbmQgcmV0cmFucyBhcmUgbmVjZXNzYXJ5LCBidXQgbm90IHN1ZmZpY2llbnQuICBUaGUg
Y2h1bmtpbmcgdG8gcnNpemUvd3NpemUgYW5kIGNhY2hpbmcgcGxheXMgYSByb2xlIGluIGhvdyB3
ZWxsIGkvbyBlcnJvcnMgZ2V0IHJlbGF5ZWQgYmFjayB0byB0aGUgYXBwbGljYXRpb25zIGRvaW5n
IHRoZSBpL28uDQoNCllvdSB3aWxsIGNlcnRhaW5seSBsb3NlIGRhdGEgaW4gdGhlc2Ugc2NlbmFy
aW8ncy4NCg0KSXQgd291bGQgYmUgZmFudGFzdGljIGlmIHNvbWVob3cgdGhlIHRpbWVvIGFuZCBy
ZXRyYW5zIHdlcmUgc3VmZmljaWVudCAoaWUgd2hlbiB0aGV5IGZhaWwsIGkvbyBlcnJvcnMgZ2V0
IGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9ucyB0aGF0IHF1ZXVlZCB0aGF0IGkvbyAob3IgZXZlbiB0
aGUgaS9vIHRoYXQgY2F1c2UgdGhlIGFwcGxpY2F0aW9uIHRvIHBlbmQgYmVjYXVzZSB0aGUgcnNp
emUvd3NpemUgb3IgY2FjaGUgd2FzIGZ1bGwpLiAgIA0KDQpZb3UgY2FuIGVsaW1pbmF0ZSBzb21l
IG9mIHRoYXQgYmVoYXZpb3Igd2l0aCBzeW5jL2RpcmVjdGlvLCBidXQgcGVyZm9ybWFuY2UgYmVj
b21lcyBhYnlzbWFsLg0KDQpJIHRyaWVkICJsYXp5IiBpdCBkaWRuJ3QgcHJvdmlkZSB0aGUgZGVz
aXJlZCBlZmZlY3QgKHRoZXkgdW5tb3VudGVkIHdoaWNoIHByZXZlbnRlZCBuZXcgaS9vJ3M7IGJ1
dCBleGlzdGluZyBJL28ncyBuZXZlciBnb3QgZXJyb3JzKS4NCg0KDQotLS0tLU9yaWdpbmFsIE1l
c3NhZ2UtLS0tLQ0KRnJvbTogTmVpbEJyb3duIDxuZWlsYkBzdXNlLmRlPg0KU2VuZGVyOiBsaW51
eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnDQpEYXRlOiAJVGh1LCA2IE1hciAyMDE0IDE2OjM3
OjIxIA0KVG86IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCkNjOiA8bGludXgt
bmZzQHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRp
b25zIHRvIHNhZmVseSBhbGxvdyBpbnRlcnJ1cHRzIGFuZA0KIHRpbWVvdXRzIG9uIG5ld2VyIGtl
cm5lbHMNCg0KT24gV2VkLCA1IE1hciAyMDE0IDIzOjAzOjQzIC0wNjAwIChDU1QpIEFuZHJldyBN
YXJ0aW4gPGFtYXJ0aW5AeGVzLWluYy5jb20+DQp3cm90ZToNCg0KPiAtLS0tLSBPcmlnaW5hbCBN
ZXNzYWdlIC0tLS0tDQo+ID4gRnJvbTogIk5laWxCcm93biIgPG5laWxiQHN1c2UuZGU+DQo+ID4g
VG86ICJBbmRyZXcgTWFydGluIiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gPiBDYzogbGludXgt
bmZzQHZnZXIua2VybmVsLm9yZw0KPiA+IFNlbnQ6IFdlZG5lc2RheSwgTWFyY2ggNSwgMjAxNCA5
OjUwOjQyIFBNDQo+ID4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8g
c2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4g
PiANCj4gPiBPbiBXZWQsIDUgTWFyIDIwMTQgMTE6NDU6MjQgLTA2MDAgKENTVCkgQW5kcmV3IE1h
cnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiA+IEhlbGxv
LA0KPiA+ID4gDQo+ID4gPiBJcyBpdCBzYWZlIHRvIHVzZSB0aGUgInNvZnQiIG1vdW50IG9wdGlv
biB3aXRoIHByb3RvPXRjcCBvbiBuZXdlciBrZXJuZWxzDQo+ID4gPiAoZS5nDQo+ID4gPiAzLjIg
YW5kIG5ld2VyKT8gQ3VycmVudGx5IHVzaW5nIHRoZSAiZGVmYXVsdHMiIG5mcyBtb3VudCBvcHRp
b25zIG9uIFVidW50dQ0KPiA+ID4gMTIuMDQgcmVzdWx0cyBpbiBwcm9jZXNzZXMgYmxvY2tpbmcg
Zm9yZXZlciBpbiB1bmludGVycnVwdGFibGUgc2xlZXAgaWYNCj4gPiA+IHRoZXkNCj4gPiA+IGF0
dGVtcHQgdG8gYWNjZXNzIGEgbW91bnRwb2ludCB3aGlsZSB0aGUgTkZTIHNlcnZlciBpcyBvZmZs
aW5lLiBJIHdvdWxkDQo+ID4gPiBwcmVmZXINCj4gPiA+IHRoYXQgTkZTIHNpbXBseSByZXR1cm4g
YW4gZXJyb3IgdG8gdGhlIGNsaWVudHMgYWZ0ZXIgcmV0cnlpbmcgYSBmZXcgdGltZXMsDQo+ID4g
PiBob3dldmVyIEkgYWxzbyBjYW5ub3QgaGF2ZSBkYXRhIGxvc3MuIEZyb20gdGhlIG1hbiBwYWdl
LCBJIHRoaW5rIHRoZXNlDQo+ID4gPiBvcHRpb25zDQo+ID4gPiB3aWxsIGdpdmUgdGhhdCBlZmZl
Y3Q/DQo+ID4gPiBzb2Z0LHByb3RvPXRjcCx0aW1lbz0xMCxyZXRyYW5zPTMNCj4gPiA+IA0KPiA+
ID4gPkZyb20gbXkgdW5kZXJzdGFuZGluZywgdGhpcyB3aWxsIGNhdXNlIE5GUyB0byByZXRyeSB0
aGUgY29ubmVjdGlvbiAzIHRpbWVzDQo+ID4gPiA+KG9uY2UNCj4gPiA+IHBlciBzZWNvbmQpLCBh
bmQgdGhlbiBpZiBhbGwgMyBhcmUgdW5zdWNjZXNzZnVsIHJldHVybiBhbiBlcnJvciB0byB0aGUN
Cj4gPiA+IGFwcGxpY2F0aW9uLiBJcyB0aGlzIGNvcnJlY3Q/IElzIHRoZXJlIGEgcmlzayBvZiBk
YXRhIGxvc3Mgb3IgY29ycnVwdGlvbiBieQ0KPiA+ID4gdXNpbmcgInNvZnQiIGluIHRoaXMgd2F5
PyBPciBpcyB0aGVyZSBhIGJldHRlciB3YXkgdG8gYXBwcm9hY2ggdGhpcz8NCj4gPiANCj4gPiBJ
IHRoaW5rIHlvdXIgYmVzdCBiZXQgaXMgdG8gdXNlIGFuIGF1dG8tbW91bnRlciBzbyB0aGF0IHRo
ZSBmaWxlc3lzdGVtIGdldHMNCj4gPiB1bm1vdW50ZWQgaWYgdGhlIHNlcnZlciBpc24ndCBhdmFp
bGFibGUuDQo+IFdvdWxkIHRoaXMgc3RpbGwgc3VjY2VlZCBpbiB1bm1vdW50aW5nIHRoZSBmaWxl
c3lzdGVtIGlmIHRoZXJlIGFyZSBhbHJlYWR5DQo+IHByb2Nlc3NlcyByZXF1ZXN0aW5nIGZpbGVz
IGZyb20gaXQgKGFuZCBibG9ja2luZyBpbiB1bmludGVycnVwdGFibGUgc2xlZXApPw0KDQpUaGUg
a2VybmVsIHdvdWxkIGFsbG93IGEgJ2xhenknIHVubW91bnQgaW4gdGhpcyBjYXNlLiAgSSBkb24n
dCBrbm93IGlmIGFueQ0KYXV0b21vdW50ZXIgd291bGQgdHJ5IGEgbGF6eSB1bm1vdW50IHRob3Vn
aCAtIEkgc3VzcGVjdCBub3QuDQoNCkEgbG9uZyB0aW1lIGFnbyBJIHVzZWQgImFtZCIgd2hpY2gg
d291bGQgY3JlYXRlIHN5c2xpbmtzIHRvIGEgc2VwYXJhdGUgdHJlZQ0Kd2hlcmUgdGhlIGZpbGVz
eXN0ZW1zIHdlcmUgbW91bnRlZC4gIEknbSBwcmV0dHkgc3VyZSB0aGF0IHdoZW4gYSBzZXJ2ZXIg
d2VudA0KYXdheSB0aGUgc3ltbGluayB3b3VsZCBkaXNhcHBlYXIgZXZlbiBpZiB0aGUgdW5tb3Vu
dCBmYWlsZWQuDQpTbyB3aGlsZSBhbnkgcHJvY2Vzc2VzIGFjY2Vzc2luZyB0aGUgZmlsZXN5c3Rl
bSB3b3VsZCBibG9jaywgbmV3IHByb2Nlc3Nlcw0Kd291bGQgbm90IGJlIGFibGUgdG8gZmluZCB0
aGUgZmlsZXN5c3RlbSBhbmQgc28gd291bGQgbm90IGJsb2NrLg0KDQoNCj4gDQo+ID4gInNvZnQi
IGFsd2F5cyBpbXBsaWVzIHRoZSByaXNrIG9mIGRhdGEgbG9zcy4gICJOdWxscyBGcmVxdWVudGx5
IFN1YnN0aXR1dGVkIg0KPiA+IGFzIGl0IHdhcyBkZXNjcmliZWQgdG8gdmVyeSBtYW55IHllYXJz
IGFnby4NCj4gPiANCj4gPiBQb3NzaWJseSBpdCB3b3VsZCBiZSBnb29kIHRvIGhhdmUgc29tZXRo
aW5nIGJldHdlZW4gJ2hhcmQnIGFuZCAnc29mdCcgZm9yDQo+ID4gY2FzZXMgbGlrZSB5b3VycyAo
eW91IGFyZW4ndCB0aGUgZmlyc3QgdG8gYXNrKS4NCj4gPiANCj4gPiAgRnJvbSBodHRwOi8vZG9j
c3RvcmUubWlrLnVhL29yZWxseS9uZXR3b3JraW5nL3B1aXMvY2gyMF8wMS5odG0NCj4gPiANCj4g
PiAgICBCU0RJIGFuZCBPU0YgLzEgYWxzbyBoYXZlIGEgc3Bvbmd5IG9wdGlvbiB0aGF0IGlzIHNp
bWlsYXIgdG8gaGFyZCAsIGV4Y2VwdA0KPiA+ICAgIHRoYXQgdGhlIHN0YXQsIGxvb2t1cCwgZnNz
dGF0LCByZWFkbGluaywgYW5kIHJlYWRkaXIgb3BlcmF0aW9ucyBiZWhhdmUNCj4gPiAgICBsaWtl
IGEgc29mdCBNT1VOVCAuDQo+ID4gDQo+ID4gTGludXggZG9lc24ndCBoYXZlICdzcG9uZ3knLiAg
TWF5YmUgaXQgY291bGQuICBPciBtYXliZSBpdCB3YXMgYSBmYWlsZWQNCj4gPiBleHBlcmltZW50
IGFuZCB0aGVyZSBhcmUgZ29vZCByZWFzb25zIG5vdCB0byB3YW50IGl0Lg0KPiANCj4gVGhlIHBy
b2JsZW0gdGhhdCBzcGFya2VkIHRoaXMgcXVlc3Rpb24gaXMgYSB3ZWJzZXJ2ZXIgd2hlcmUgYXBh
Y2hlIGNhbiBzZXJ2ZQ0KPiBmaWxlcyBmcm9tIGFuIE5GUyBtb3VudC4gSWYgdGhlIE5GUyBzZXJ2
ZXIgYmVjb21lcyB1bmF2YWlsYWJsZSwgdGhlbiB0aGUgYXBhY2hlDQo+IHByb2Nlc3NlcyBibG9j
ayBpbiB1bmludGVycnVwdGFibGUgc2xlZXAgYW5kIGRyaXZlIHRoZSBsb2FkIHZlcnkgaGlnaCwg
Zm9yY2luZw0KPiBhIHNlcnZlciByZXN0YXJ0LiBJdCB3b3VsZCBiZSBiZXR0ZXIgZm9yIHRoaXMg
Y2FzZSBpZiB0aGUgbW91bnQgd291bGQgc2ltcGx5IA0KPiByZXR1cm4gYW4gZXJyb3IgdG8gYXBh
Y2hlLCBzbyB0aGF0IGl0IHdvdWxkIGdpdmUgdXAgcmF0aGVyIHRoYW4gYmxvY2tpbmcgDQo+IGZv
cmV2ZXIgYW5kIHRha2luZyBkb3duIHRoZSBzeXN0ZW0uIENhbiBzdWNoIGJlaGF2aW9yIGJlIGFj
aGlldmVkIHNhZmVseT8NCg0KSWYgeW91IGhhdmUgYSBtb25pdG9yaW5nIHByb2dyYW0gdGhhdCBu
b3RpY2VzIHRoaXMgaGlnaCBsb2FkIHlvdSBjYW4gdHJ5DQogIHVtb3VudCAtZiAvbW91bnQvcG9p
bnQNCg0KVGhlICItZiIgc2hvdWxkIGNhdXNlIG91dHN0YW5kaW5nIHJlcXVlc3RzIHRvIGZhaWwu
ICBUaGF0IGRvZXNuJ3Qgc3RvcCBtb3JlDQpyZXF1ZXN0cyBiZWluZyBtYWRlIHRob3VnaCBzbyBp
dCBtaWdodCBub3QgYmUgY29tcGxldGVseSBzdWNjZXNzZnVsLg0KUG9zc2libHkgcnVubmluZyBp
dCBzZXZlcmFsIHRpbWVzIHdvdWxkIGhlbHAuDQoNCiAgbW91bnQgLS1tb3ZlIC9tb3VudC9wb2lu
dCAvc29tZXdoZXJlL3NhZmUNCiAgZm9yIGkgaW4gezEuLjE1fTsgZG8gdW1vdW50IC1mIC9zb21l
d2hlcmUvc2FmZTsgZG9uZQ0KDQptaWdodCBiZSBldmVuIGJldHRlciwgaWYgeW91IGNhbiBnZXQg
Im1vdW50IC0tbW92ZSIgdG8gd29yay4gIEl0IGRvZXNuJ3Qgd29yaw0KZm9yIG1lLCBwcm9iYWJs
eSB0aGUgZmF1bHQgb2Ygc3lzdGVtZCAoaXNuJ3QgZXZlcnl0aGluZyA6LSkpLg0KDQpOZWlsQnJv
d24NCg0KDQoNCg==


2014-03-06 20:46:14

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

----- Original Message -----
> From: "Trond Myklebust" <[email protected]>
> > I attempted to get a backtrace from one of the uninterruptable apache
> > processes:
> > echo w > /proc/sysrq-trigger
> >
> > Here's one example:
> > [1227348.003904] apache2 D 0000000000000000 0 10175 1773
> > 0x00000004
> > [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00
> > 0000000000015e00
> > [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00
> > ffff8801d88f0000
> > [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00
> > ffff8801d88f03d0
> > [1227348.003912] Call Trace:
> > [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
> > [sunrpc]
> > [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40
> > [sunrpc]
> > [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90
> > [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
> > [sunrpc]
> > [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90
> > [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40
> > [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc]
> > [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc]
>
> That basically means that the process is hanging in the RPC layer, somewhere
> in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’
> user should give us a dump of which state these RPC calls are in. Can you
> please try that?
Yes I will definitely run that the next time it happens, but since it occurs
sporadically (and I have not yet found a way to reproduce it on demand), it
could be days before it occurs again. I'll also run "netstat -tn" to check the
TCP connections the next time this happens.

2014-03-06 04:38:05

by NeilBrown

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On Wed, 5 Mar 2014 22:47:27 -0500 Jim Rees <[email protected]> wrote:

> NeilBrown wrote:
>
> On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <[email protected]> wrote:
>
> > Andrew Martin wrote:
> >
> > Isn't intr/nointr deprecated (since kernel 2.6.25)?
> >
> > It isn't so much that it's deprecated as that it's now the default (except
> > that only SIGKILL will work).
>
> Not quite correct. Any signal will work providing its behaviour is to kill
> the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc
> will work providing that aren't caught or ignored by the process.
>
> If that's true, then the man page is wrong and someone should fix it. I'll
> work up a patch if someone can confirm the behavior.

I just mounted a filesystem, turned off my network connection, ran "ls -l" and
then tried to kill the "ls"....
To my surprise, only SIGKILL worked.
I looked more closely and discovered that "ls" catches SIGHUP SIGINT SIGQUIT
SIGTERM, so those signals won't kill it....

So I tried to "cat" a file on the NFS filesystem. 'cat' doesn't catch any
signals. SIGHUP SIGTERM SIGINT all worked on 'cat'.
'df' also responds to 'SIGINT'.

It would be nice if 'ls' only caught signals while printing (so it can
restore the default colour) and didn't during 'stat' and 'readdir'. But
maybe no-one cares enough.

So the man page is not quite accurate.

NeilBrown


Attachments:
signature.asc (828.00 B)

2014-03-06 15:33:10

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:

>
> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>
>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>> and not try to write anything to nfs.
>
> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.

What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 20:38:15

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 12:47 PM, Trond Myklebust <[email protected]> wrote:

>
> On Mar 6, 2014, at 11:45, Chuck Lever <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 11:16 AM, Trond Myklebust <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 11:13, Chuck Lever <[email protected]> wrote:
>>>
>>>>
>>>> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>>>>>>> and not try to write anything to nfs.
>>>>>>>>
>>>>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>>>>>>
>>>>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>>>>>>
>>>>>> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.
>>>>>
>>>>> Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit?
>>>>
>>>> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me.
>>>>
>>>> If anyone else had claimed a testing gap, you would have said ?If that were the case, we would have a blatant read bug? and left it at that. But you had to go one needless and provocative step further.
>>>>
>>>> Stop bullying me, Trond. I?ve had enough of it.
>>>
>>> The stop spreading FUD. That is far from professional too.
>>
>> FUD is a marketing term, and implies I had intent to deceive. Really?
>>
>> I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read ?IMO? where it is implied.
>>
>> It is absolutely your right to say that I?m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark.
>>
>> Face it, you over-reacted. Again. Knock it off.
>>
>
> You clearly don?t know what other people are testing with, and you clearly didn?t ask anyone before you started telling users that 'soft' is untested.

I suggested in a reply TO YOU that perhaps this use case was untested...

> I happen to know a server vendor for which _all_ internal QA tests are done using the ?soft? mount option on the clients. This is done for practical reasons in order to prevent client hangs if the server should panic.

? and that?s all you needed to say in response. But you have chosen to turn it into a shouting match because you read more into my words than was there.

> I strongly suspect that other QA departments are testing the ?soft' case too.

?I strongly suspect? means you don?t know for sure either.

Clearly Andrew and Brian are reporting a problem here, whether or not it?s related to data corruption, and vendor testing has not found it yet, apparently. I?m not surprised. Testing is difficult, and too often it finds only exactly what you?re looking for.

(On the technical issue, just using ?soft? does not constitute a robust test. Repeatedly exercising the soft timeout is not the same as having ?soft? in play ?just in case? the server panics.)

> Acting as if you are an authoritative source on the subject of testing, when you are not and you know that you are not, does constitute intentional deception, yes.

No-one is "acting like an authority on testing," except maybe you.

What possible reason could I have for deceiving anyone about my authority or anything else? Do you understand that calling someone a liar in public is deeply offensive? Do you understand how unnecessarily humiliating your words are?

Assuming that you do understand, the level of inappropriate heat here is a sign that you have a long-standing personal issue with me. You seem to always read my words as a challenge to your authority, and that is never what I intend. There is nothing I can do about your mistaken impression of me.

> ?and no, I don?t see anything above to indicate that this was an ?opinion? on the subject of what is being tested which is precisely why I called it.

LOL. You ?called it? because my claim that testing wasn?t sufficient touched a nerve.

Are you really suggesting we all need to add ?IMO? and a giant .sig disclaimer to everything we post to this list, or else Trond will swat us with a rolled up newspaper if he doesn?t happen to agree?

--
Chuck Lever




2014-03-06 16:13:39

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 11:02 AM, Trond Myklebust <[email protected]> wrote:

>
> On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>>
>>>>
>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>>
>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>>> and not try to write anything to nfs.
>>>>
>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>>
>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>>
>> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.
>
> Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit?

If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me.

If anyone else had claimed a testing gap, you would have said ?If that were the case, we would have a blatant read bug? and left it at that. But you had to go one needless and provocative step further.

Stop bullying me, Trond. I?ve had enough of it.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-03-06 18:56:37

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpVc2luZyB1bW91bnQgLWYgcmVwZWF0ZWRseSBkaWQgZXZlbnR1YWxseSBnZXQgaS9vIGVycm9y
cyBiYWNrIHRvIGFsbCB0aGUgcmVhZC93cml0ZXMuDQoNCkkgdW5kZXJzdGFuZCBSaWMncyBjb21t
ZW50IGFib3V0IHVzaW5nIGZzeW5jLCBhbmQgd2UgZG8gaW4gZmFjdCB1c2UgZnN5bmMgYXQgZGF0
YSBzeW5jaHJvbml6YXRpb24gcG9pbnRzIChsaWtlIGNsb3NlLCBzZWVrcywgY2hhbmdlcyBmcm9t
IHdyaXRlIHRvIHJlYWQsIGV0YyAtLSBvdXJzIGlzIGEgc2VxdWVudGlhbCBpL28gYXBwbGljYXRp
b24pLiAgIEJ1dCBpdCBpcyB0aGVzZSB3cml0ZXMgYW5kIHJlYWRzIHRoYXQgZW5kIHVwIGh1bmcg
bW9zdCBvZiB0aGUgdGltZTsgbm90IGFuIGZzeW5jIGNhbGwuICAgSSBzdXNwZWN0IGJlY2F1c2Ug
aXQgaXMgdGhlIHdyaXRlcyB0aGF0IGV2ZW50dWFsbHkgZ2V0IHRoZSBjYWNoZS9idWZmZXJzIHRv
IHRoZSBwb2ludCB3aGVyZSB0aGF0IHdyaXRlIGhhcyB0byBibG9jayB1bnRpbCB0aGUgY2FjaGUg
Z2V0cyBzb21lIGJsb2NrIGZsdXNoZWQgdG8gbWFrZSByb29tLg0KDQotLS0tLU9yaWdpbmFsIE1l
c3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCkRh
dGU6IFRodSwgNiBNYXIgMjAxNCAwOTozMDoyMSANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT4N
CkNjOiBOZWlsQnJvd248bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJAdmdlci5rZXJu
ZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBSZTogT3B0aW1h
bCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQNCiB0aW1l
b3V0cyBvbiBuZXdlciBrZXJuZWxzDQoNCj4gRnJvbTogIkJyaWFuIEhhd2xleSIgPGJoYXdsZXlA
bHVtaW5leC5jb20+DQo+IA0KPiBJIGVuZGVkIHVwIHdyaXRpbmcgYSAibWFuYWdlX21vdW50cyIg
c2NyaXB0IHJ1biBieSBjcm9uIHRoYXQgY29tcGFyZXMNCj4gL3Byb2MvbW91bnRzIGFuZCB0aGUg
ZnN0YWIsIHVzZWQgcGluZywgYW5kICJ0aW1lb3V0IiBtZXNzYWdlcyBpbg0KPiAvdmFyL2xvZy9t
ZXNzYWdlcyB0byBpZGVudGlmeSBmaWxlc3lzdGVtcyB0aGF0IGFyZW4ndCByZXNwb25kaW5nLCBy
ZXBlYXRlZGx5DQo+IGRvIHVtb3VudCAtZiB0byBmb3JjZSBpL28gZXJyb3JzIGJhY2sgdG8gdGhl
IGNhbGxpbmcgYXBwbGljYXRpb25zOyBhbmQgd2hlbg0KPiBtaXNzaW5nIG1vdW50cyAoaW4gZnN0
YWIgYnV0IG5vdCAvcHJvYy9tb3VudHMpIGJ1dCB3ZXJlIG5vdyBwaW5nYWJsZSwNCj4gYXR0ZW1w
dCB0byByZW1vdW50IHRoZW0uDQo+IA0KPiANCj4gRm9yIG1lLCB0aW1lbyBhbmQgcmV0cmFucyBh
cmUgbmVjZXNzYXJ5LCBidXQgbm90IHN1ZmZpY2llbnQuICBUaGUgY2h1bmtpbmcgdG8NCj4gcnNp
emUvd3NpemUgYW5kIGNhY2hpbmcgcGxheXMgYSByb2xlIGluIGhvdyB3ZWxsIGkvbyBlcnJvcnMg
Z2V0IHJlbGF5ZWQgYmFjaw0KPiB0byB0aGUgYXBwbGljYXRpb25zIGRvaW5nIHRoZSBpL28uDQo+
IA0KPiBZb3Ugd2lsbCBjZXJ0YWlubHkgbG9zZSBkYXRhIGluIHRoZXNlIHNjZW5hcmlvJ3MuDQo+
IA0KPiBJdCB3b3VsZCBiZSBmYW50YXN0aWMgaWYgc29tZWhvdyB0aGUgdGltZW8gYW5kIHJldHJh
bnMgd2VyZSBzdWZmaWNpZW50IChpZQ0KPiB3aGVuIHRoZXkgZmFpbCwgaS9vIGVycm9ycyBnZXQg
YmFjayB0byB0aGUgYXBwbGljYXRpb25zIHRoYXQgcXVldWVkIHRoYXQgaS9vDQo+IChvciBldmVu
IHRoZSBpL28gdGhhdCBjYXVzZSB0aGUgYXBwbGljYXRpb24gdG8gcGVuZCBiZWNhdXNlIHRoZSBy
c2l6ZS93c2l6ZQ0KPiBvciBjYWNoZSB3YXMgZnVsbCkuDQo+IA0KPiBZb3UgY2FuIGVsaW1pbmF0
ZSBzb21lIG9mIHRoYXQgYmVoYXZpb3Igd2l0aCBzeW5jL2RpcmVjdGlvLCBidXQgcGVyZm9ybWFu
Y2UNCj4gYmVjb21lcyBhYnlzbWFsLg0KPiANCj4gSSB0cmllZCAibGF6eSIgaXQgZGlkbid0IHBy
b3ZpZGUgdGhlIGRlc2lyZWQgZWZmZWN0ICh0aGV5IHVubW91bnRlZCB3aGljaA0KPiBwcmV2ZW50
ZWQgbmV3IGkvbydzOyBidXQgZXhpc3RpbmcgSS9vJ3MgbmV2ZXIgZ290IGVycm9ycykuDQpUaGlz
IGlzIHRoZSBwcm9ibGVtIEkgYW0gaGF2aW5nIC0gSSBjYW4gdW5tb3VudCB0aGUgZmlsZXN5c3Rl
bSB3aXRoIC1sLCBidXQNCm9uY2UgaXQgaXMgdW5tb3VudGVkIHRoZSBleGlzdGluZyBhcGFjaGUg
cHJvY2Vzc2VzIGFyZSBzdGlsbCBzdHVjayBmb3JldmVyLg0KRG9lcyByZXBlYXRlZGx5IHJ1bm5p
bmcgInVtb3VudCAtZiIgaW5zdGVhZCBvZiAidW1vdW50IC1sIiBhcyB5b3UgZGVzY3JpYmUNCnJl
dHVybiBJL08gZXJyb3JzIGJhY2sgdG8gZXhpc3RpbmcgcHJvY2Vzc2VzIGFuZCBhbGxvdyB0aGVt
IHRvIHN0b3A/DQoNCg0KPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4gR2l2
ZW4gdGhpcyBpcyBhcGFjaGUsIEkgdGhpbmsgaWYgSSB3ZXJlIGRvaW5nIHRoaXMgSSdkIHVzZSBy
byxzb2Z0LGludHIsdGNwDQo+IGFuZCBub3QgdHJ5IHRvIHdyaXRlIGFueXRoaW5nIHRvIG5mcy4N
Ckkgd2FzIHVzaW5nIHRjcCxiZyxzb2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0gb2NjdXJyZWQu
IEkgZG8gbm90IGtub3cgaWYNCmFwYWNoZSB3YXMgYXR0ZW1wdGluZyB0byBkbyBhIHdyaXRlIG9y
IGEgcmVhZCwgYnV0IGl0IHNlZW1zIHRoYXQgdGNwLHNvZnQsaW50cg0Kd2FzIG5vdCBzdWZmaWNp
ZW50IHRvIHByZXZlbnQgdGhlIHByb2JsZW0uIA0KDQo=


2014-03-06 18:50:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 13:35, Andrew Martin <[email protected]> wrote:

>> From: "Jim Rees" <[email protected]>
>> Why would a bunch of blocked apaches cause high load and reboot?
> What I believe happens is the apache child processes go to serve
> these requests and then block in uninterruptable sleep. Thus, there
> are fewer and fewer child processes to handle new incoming requests.
> Eventually, apache would normally kill said children (e.g after a
> child handles a certain number of requests), but it cannot kill them
> because they are in uninterruptable sleep. As more and more incoming
> requests are queued (and fewer and fewer child processes are available
> to serve the requests), the load climbs.

Does ?top? support this theory? Presumably you should see a handful of non-sleeping apache threads dominating the load when it happens.

Why is the server becoming ?unavailable? in the first place? Are you taking it down?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 20:34:30

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpXZSdyZSBub3QgaW50ZW5kaW5nIHRvIGFnZ3Jlc3NpdmVseSBjYWNoZS4gIFRoZXJlIGp1c3Qg
aGFwcGVucyB0byBiZSBhIGxvdCBvZiBmcmVlIG1lbW9yeS4gICAgDQoNCg0KLS0tLS1PcmlnaW5h
bCBNZXNzYWdlLS0tLS0NCkZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHBy
aW1hcnlkYXRhLmNvbT4NClNlbmRlcjogbGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZw0K
RGF0ZTogCVRodSwgNiBNYXIgMjAxNCAxNTozMTozMyANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNv
bT4NCkNjOiA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IEFuZHJldyBNYXJ0aW48
YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVp
bDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBS
ZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBh
bmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2LCAyMDE0LCBhdCAxNDo1
NiwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCg0KPiANCj4gR2l2
ZW4gdGhhdCB0aGUgc3lzdGVtcyB0eXBpY2FsbHkgaGF2ZSAxNkdCJ3MsIHRoZSBtZW1vcnkgYXZh
aWxhYmxlIGZvciBjYWNoZSBpcyB1c3VhbGx5IGFyb3VuZCAxM0dCLg0KPiANCj4gRGlydHkgd3Jp
dGViYWNrIGNlbnRpc2VjcyBpcyBzZXQgdG8gMTAwLCBhcyBpcyBkaXJ0eSBleHBpcmUgY2VudGlz
ZWNzICh3ZSBhcmUgcHJpbWFyaWx5IGEgc2VxdWVudGlhbCBhY2Nlc3MgYXBwbGljYXRpb24pLg0K
PiANCj4gRGlydHkgcmF0aW8gaXMgNTAgYW5kIGRpcnR5IGJhY2tncm91bmQgcmF0aW8gaXMgMTAu
IA0KDQpUaGF0IG1lYW5zIHlvdSBjYW4gaGF2ZSB1cCB0byA4R0IgdG8gcHVzaCBvdXQgaW4gb25l
IGdvLiBZb3UgY2FuIGhhcmRseSBibGFtZSBORlMgZm9yIGJlaW5nIHNsb3cgaW4gdGhhdCBzaXR1
YXRpb24uDQpXaHkgZG8geW91IG5lZWQgdG8gY2FjaGUgdGhlc2Ugd3JpdGVzIHNvIGFnZ3Jlc3Np
dmVseT8gSXMgdGhlIGRhdGEgYmVpbmcgZWRpdGVkIGFuZCByZXdyaXR0ZW4gbXVsdGlwbGUgdGlt
ZXMgaW4gdGhlIHBhZ2UgY2FjaGUgYmVmb3JlIHlvdSB3YW50IHRvIHB1c2ggaXQgdG8gZGlzaz8N
Cg0KPiBXZSBzZXQgdGhlc2UgdG8gdHJ5IHRvIGtlZXAgdGhlIGRhdGEgZnJvbSBjYWNoZSBhbHdh
eXMgYmVpbmcgcHVzaGVkIG91dC4NCj4gDQo+IE5vIG9vcHNlcy4gICBUeXBpY2FsbHkgaXQgd291
bGQgYmUgZHVlIHRvIGFuIGFwcGxpYW5jZSBvciBuZXR3b3JrIGNvbm5lY3Rpb24gdG8gaXQgZ29p
bmcgZG93bi4gIEF0IHdoaWNoIHBvaW50LCB3ZSB3YW50IHRvIGZhaWwgb3ZlciB0byBhbiBhbHRl
cm5hdGl2ZSBhcHBsaWFuY2Ugd2hpY2ggaXMgc2VydmluZyB0aGUgc2FtZSBkYXRhLiAgICANCj4g
DQo+IEl0J3MgdW5mb3J0dW5hdGUgdGhhdCB3aGVuIHRoZSBpL28gZXJyb3IgaXMgZGV0ZWN0ZWQg
dGhhdCB0aGUgb3RoZXIgcGFja2V0cyBjYW4ndCBqdXN0IHRpbWVvdXQgcmlnaHQgYXdheSB3aXRo
IHRoZSBpL28gZXJyb3IuICAgQWZ0ZXIgYWxsLCBpdCdzIHVubGlrZWx5IHRvIGNvbWUgYmFjaywg
YW5kIGlmIGl0IGRvZXMsIHlvdSd2ZSBsb3N0IHRoYXQgZGF0YSB0aGF0IHdhcyBjYWNoZWQuICBJ
J2QgYWxtb3N0IHJhdGhlciBoYXZlIGFsbCB0aGUgaS9vJ3MgdGhhdCB3ZXJlIGNhY2hlZCB1cCB0
byB0aGUgYmxvY2tlZCBvbmUgZmFpbCBzbyBJIGtub3cgdGhlcmUgd2FzIGEgZmFpbHVyZSBvZiBz
b21lIG9mIHRoZSB3cml0ZXMgcHJlY2VlZGluZyB0aGUgb25lIHRoYXQgYmxvY2tlZCBhbmQgZ290
IHRoZSBpL28gZXJyb3IuICAgIFRoaXMgaXMgdGhlIHByaWNlIHdlIHBheSBmb3IgdXNpbmcgInNv
ZnQiIGFuZCBpdCBpcyBhbiBleHBlY3RlZCBwcmljZS4gICBPdGhlcndpc2UsIHdlJ2QgdXNlICJo
YXJklC4NCg0KUmlnaHQsIGJ1dCB0aGUgUlBDIGxheWVyIGRvZXMgbm90IGtub3cgdGhhdCB0aGVz
ZSBhcmUgYWxsIHdyaXRlcyB0byB0aGUgc2FtZSBmaWxlLCBhbmQgaXQgY2FuknQgYmUgZXhwZWN0
ZWQgdG8ga25vdyB3aHkgdGhlIHNlcnZlciBpc26SdCByZXBseWluZy4gRm9yIGluc3RhbmNlLCBJ
knZlIGtub3duIGEgc2luZ2xlIJF1bmxpbmsnIFJQQyBjYWxsIHRvIHRha2UgMTcgbWludXRlcyB0
byBjb21wbGV0ZSBvbiBhIHNlcnZlciB0aGF0IGhhZCBhIGxvdCBvZiBjbGVhbnVwIHRvIGRvIG9u
IHRoYXQgZmlsZTsgZHVyaW5nIHRoYXQgdGltZSwgdGhlIHNlcnZlciB3YXMgaGFwcHkgdG8gdGFr
ZSBSUEMgcmVxdWVzdHMgZm9yIG90aGVyIGZpbGVzLi4uDQoNCg0KPiAtLS0tLU9yaWdpbmFsIE1l
c3NhZ2UtLS0tLQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmlt
YXJ5ZGF0YS5jb20+DQo+IFNlbmRlcjogbGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZw0K
PiBEYXRlOglUaHUsIDYgTWFyIDIwMTQgMTQ6NDc6NDggDQo+IFRvOiA8Ymhhd2xleUBsdW1pbmV4
LmNvbT4NCj4gQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVz
PHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1v
d25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4gU3Vi
amVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVy
cnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4gDQo+IA0KPiBPbiBNYXIgNiwg
MjAxNCwgYXQgMTQ6MzMsIEJyaWFuIEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6
DQo+IA0KPj4gDQo+PiBXZSBkbyBjYWxsIGZzeW5jIGF0IHN5bmNocm9uaXphdGlvbiBwb2ludHMu
DQo+PiANCj4+IFRoZSBwcm9ibGVtIGlzIHRoZSB3cml0ZSgpIGJsb2NrcyBmb3JldmVyIChvciBm
b3IgYW4gZXhjZXB0aW9uYWxseSBsb25nIHRpbWUgb24gdGhlIG9yZGVyIG9mIGhvdXJzIGFuZCBk
YXlzKSwgZXZlbiB3aXRoIHRpbWVvIHNldCB0byBzYXkgMjAgYW5kIHJldHJhbnMgc2V0IHRvIDIu
ICBXZSBzZWUgdGltZW91dCBtZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcywgYnV0IHRoZSB3
cml0ZSBjb250aW51ZXMgdG8gcGVuZC4gICBVbnRpbCB3ZSBzdGFydCBkb2luZyByZXBlYXRlZCB1
bW91bnQgLWYncy4gIFRoZW4gaXQgcmV0dXJucyBhbmQgaGFzIGFuIGkvbyBlcnJvci4NCj4gDQo+
IEhvdyBtdWNoIGRhdGEgYXJlIHlvdSB0cnlpbmcgdG8gc3luYz8gkXNvZnSSIHdvbpJ0IHRpbWUg
b3V0IHRoZSBlbnRpcmUgYmF0Y2ggYXQgb25jZS4gSXQgZmVlZHMgZWFjaCB3cml0ZSBSUEMgY2Fs
bCB0aHJvdWdoLCBhbmQgbGV0cyBpdCB0aW1lIG91dC4gU28gaWYgeW91IGhhdmUgY2FjaGVkIGEg
aHVnZSBhbW91bnQgb2Ygd3JpdGVzLCB0aGVuIHRoYXQgY2FuIHRha2UgYSB3aGlsZS4gVGhlIHNv
bHV0aW9uIGlzIHRvIHBsYXkgd2l0aCB0aGUgkWRpcnR5X2JhY2tncm91bmRfYnl0ZXOSIChhbmQv
b3IgkWRpcnR5X2J5dGVzkikgc3lzY3RsIHNvIHRoYXQgaXQgc3RhcnRzIHdyaXRlYmFjayBhdCBh
biBlYXJsaWVyIHRpbWUuDQo+IA0KPiBBbHNvLCB3aGF0IGlzIHRoZSBjYXVzZSBvZiB0aGVzZSBz
dGFsbHMgaW4gdGhlIGZpcnN0IHBsYWNlPyBJcyB0aGUgVENQIGNvbm5lY3Rpb24gdG8gdGhlIHNl
cnZlciBzdGlsbCB1cD8gQXJlIGFueSBPb3BzZXMgcHJlc2VudCBpbiBlaXRoZXIgdGhlIGNsaWVu
dCBvciB0aGUgc2VydmVyIHN5c2xvZ3M/DQo+IA0KPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0t
LS0NCj4+IEZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh
LmNvbT4NCj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDoyNjoyNCANCj4+IFRvOiA8Ymhhd2xl
eUBsdW1pbmV4LmNvbT4NCj4+IENjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+
OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs
aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v
cmc+DQo+PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkg
YWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KPj4gDQo+PiAN
Cj4+IE9uIE1hciA2LCAyMDE0LCBhdCAxNDoxNCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWlu
ZXguY29tPiB3cm90ZToNCj4+IA0KPj4+IA0KPj4+IFRyb25kLA0KPj4+IA0KPj4+IEluIHRoaXMg
Y2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2UgdGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkv
byBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLiAgDQo+PiANCj4+IE15IHBvaW50IGlzIHRoYXQg
d3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRvIHJldHVybiBhbiBlcnJvciBpbiB0aGUgY2Fz
ZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5hdmFpbGFibGUuIFVubGVzcyB5b3UgdXNlIE9f
U1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4gdGhlIGtlcm5lbCBpcyBlbnRpdGxlZCBhbmQg
aW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBkYXRhIGluIGl0cyBwYWdlIGNhY2hlIHVudGls
IHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4gVGhlIHJldHVybiB2YWx1ZSBvZiB0aGF0IGZz
eW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3aGV0aGVyIG9yIG5vdCB5b3VyIGRhdGEgaGFz
IHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0KPj4gDQo+Pj4gQW5kIHdlIGNoZWNrIHRoZSBy
ZXR1cm4gdmFsdWUgb2YgZXZlcnkgaS9vIHJlbGF0ZWQgY29tbWFuZC4NCj4+IA0KPj4+IFdlIGFy
ZW4ndCB1c2luZyBzeW5jaHJvbm91cyBiZWNhdXNlIHRoZSBwZXJmb3JtYW5jZSBiZWNvbWVzIGFi
eXNtYWwuDQo+Pj4gDQo+Pj4gUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxseSByZXN1
bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUgd3JpdGUo
KSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdvcmtpbmcg
dGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZlbnR1YWxs
eSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4gICAgDQo+Pj4gDQo+Pj4gQXMgSSBt
ZW50aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBpL28g
dHlwZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZvcm1h
bmNlIHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KPj4gDQo+PiBTdXJlLCBidXQgaW4g
dGhhdCBjYXNlIHlvdSBkbyBuZWVkIHRvIGNhbGwgZnN5bmMoKSBiZWZvcmUgdGhlIGFwcGxpY2F0
aW9uIGV4aXRzLiBOb3RoaW5nIGVsc2UgY2FuIGd1YXJhbnRlZSBkYXRhIHN0YWJpbGl0eSwgYW5k
IHRoYXSScyB0cnVlIGZvciBhbGwgc3RvcmFnZS4NCj4+IA0KPj4+IC0tLS0tT3JpZ2luYWwgTWVz
c2FnZS0tLS0tDQo+Pj4gRnJvbTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJp
bWFyeWRhdGEuY29tPg0KPj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANCj4+PiBU
bzogPGJoYXdsZXlAbHVtaW5leC5jb20+DQo+Pj4gQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4
ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBz
dXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZn
ZXIua2VybmVsLm9yZz4NCj4+PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9u
cyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVs
cw0KPj4+IA0KPj4+IA0KPj4+IE9uIE1hciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5
IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCj4+PiANCj4+Pj4gDQo+Pj4+IEV2ZW4gd2l0
aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91IHdvbid0IGdldCBpL28gZXJyb3JzIGJhY2sg
dG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3MgYmVlbiBvdXIgZXhwZXJpZW5jZSBhbnl3YXku
DQo+Pj4gDQo+Pj4gUmVhZCBjYWNoaW5nLCBhbmQgYnVmZmVyZWQgd3JpdGVzIG1lYW4gdGhhdCB0
aGUgSS9PIGVycm9ycyBvZnRlbiBkbyBub3Qgb2NjdXIgZHVyaW5nIHRoZSByZWFkKCkvd3JpdGUo
KSBzeXN0ZW0gY2FsbCBpdHNlbGYuDQo+Pj4gDQo+Pj4gV2UgZG8gdHJ5IHRvIHByb3BhZ2F0ZSBJ
L08gZXJyb3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGFzIHNvb24gYXMgdGhlIGRvIG9jY3Vy
LCBidXQgaWYgdGhhdCBhcHBsaWNhdGlvbiBpc26SdCB1c2luZyBzeW5jaHJvbm91cyBJL08sIGFu
ZCBpdCBpc26SdCBjaGVja2luZyB0aGUgcmV0dXJuIHZhbHVlcyBvZiBmc3luYygpIG9yIGNsb3Nl
KCksIHRoZW4gdGhlcmUgaXMgbGl0dGxlIHRoZSBrZXJuZWwgY2FuIGRvLi4uDQo+Pj4gDQo+Pj4+
IA0KPj4+PiBXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRo
YXQgaGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0
aGUgc3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9v
IGlmIGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2Ug
bmV2ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPj4+
PiANCj4+Pj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5nLiAgDQo+Pj4+
IA0KPj4+PiANCj4+Pj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4+Pj4gRnJvbTogQW5k
cmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4+Pj4gRGF0ZTogVGh1LCA2IE1hciAy
MDE0IDEwOjQzOjQyIA0KPj4+PiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+DQo+Pj4+IENj
OiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4
LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4N
Cj4+Pj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFs
bG93IGludGVycnVwdHMgYW5kDQo+Pj4+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4+Pj4g
DQo+Pj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+Pj4+IEFuZHJldyBN
YXJ0aW4gd3JvdGU6DQo+Pj4+PiANCj4+Pj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNo
LmVkdT4NCj4+Pj4+PiBHaXZlbiB0aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9p
bmcgdGhpcyBJJ2QgdXNlDQo+Pj4+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+Pj4+IGFuZCBub3Qg
dHJ5IHRvIHdyaXRlIGFueXRoaW5nIHRvIG5mcy4NCj4+Pj4+IEkgd2FzIHVzaW5nIHRjcCxiZyxz
b2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0gb2NjdXJyZWQuIEkgZG8gbm90IGtub3cgaWYNCj4+
Pj4+IGFwYWNoZSB3YXMgYXR0ZW1wdGluZyB0byBkbyBhIHdyaXRlIG9yIGEgcmVhZCwgYnV0IGl0
IHNlZW1zIHRoYXQNCj4+Pj4+IHRjcCxzb2Z0LGludHINCj4+Pj4+IHdhcyBub3Qgc3VmZmljaWVu
dCB0byBwcmV2ZW50IHRoZSBwcm9ibGVtLg0KPj4+Pj4gDQo+Pj4+PiBJIGhhZCB0aGUgaW1wcmVz
c2lvbiBmcm9tIHlvdXIgb3JpZ2luYWwgbWVzc2FnZSB0aGF0IHlvdSB3ZXJlIG5vdCB1c2luZw0K
Pj4+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNhZmUgdG8gdXNlIGl0LiBBcmUg
eW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+Pj4gdGhlICJzb2Z0IiBvcHRpb24gdGhlIGFw
YWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQo+Pj4+IFllcywgZXZlbiB3aXRoIHNvZnQsIGl0IGdl
dHMgc3R1Y2sgZm9yZXZlci4gSSBoYWQgYmVlbiB1c2luZyB0Y3AsYmcsc29mdCxpbnRyDQo+Pj4+
IHdoZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Npb25zKSwgc28gbXkg
b3JpZ2luYWwgcXVlc3Rpb24gd2FzDQo+Pj4+IGlmIGl0IHdvdWxkIGJlIHNhZmUgdG8gdXNlIGEg
c21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRvIGhvcGVmdWxseSANCj4+Pj4gcmV0dXJu
IEkvTyBlcnJvcnMgcXVpY2tseSB0byB0aGUgYXBwbGljYXRpb24sIHJhdGhlciB0aGFuIGJsb2Nr
aW5nIGZvcmV2ZXIgDQo+Pj4+ICh3aGljaCBjYXVzZXMgdGhlIGhpZ2ggbG9hZCBhbmQgaW5ldml0
YWJsZSByZWJvb3QpLiBJdCBzb3VuZHMgbGlrZSB0aGF0IGlzbid0DQo+Pj4+IHNhZmUsIGJ1dCBw
ZXJoYXBzIHRoZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29sdmUgdGhpcyBwcm9ibGVtPw0KPj4+
PiAtLQ0KPj4+PiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAi
dW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KPj4+PiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8g
bWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPj4+PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBo
dHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4+Pj4gDQo+Pj4gDQo+
Pj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQo+Pj4gVHJvbmQgTXlrbGVidXN0
DQo+Pj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KPj4+IHRyb25k
Lm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCj4+PiANCj4+IA0KPj4gX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fDQo+PiBUcm9uZCBNeWtsZWJ1c3QNCj4+IExpbnV4IE5GUyBjbGll
bnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4+IHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0
YS5jb20NCj4+IA0KPiANCj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQo+IFRy
b25kIE15a2xlYnVzdA0KPiBMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRh
DQo+IHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCj4gDQo+IC0tDQo+IFRvIHVuc3Vi
c2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1u
ZnMiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwu
b3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFq
b3Jkb21vLWluZm8uaHRtbA0KPiANCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxp
c3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkg
b2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRv
bW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+
IA0KDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NClRyb25kIE15a2xlYnVzdA0K
TGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0
QHByaW1hcnlkYXRhLmNvbQ0KDQotLQ0KVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNl
bmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCnRoZSBib2R5IG9mIGEgbWVz
c2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQpNb3JlIG1ham9yZG9tbyBpbmZvIGF0
ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCg0K


2014-03-06 05:03:49

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

----- Original Message -----
> From: "NeilBrown" <[email protected]>
> To: "Andrew Martin" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, March 5, 2014 9:50:42 PM
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
> On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <[email protected]>
> wrote:
>
> > Hello,
> >
> > Is it safe to use the "soft" mount option with proto=tcp on newer kernels
> > (e.g
> > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu
> > 12.04 results in processes blocking forever in uninterruptable sleep if
> > they
> > attempt to access a mountpoint while the NFS server is offline. I would
> > prefer
> > that NFS simply return an error to the clients after retrying a few times,
> > however I also cannot have data loss. From the man page, I think these
> > options
> > will give that effect?
> > soft,proto=tcp,timeo=10,retrans=3
> >
> > >From my understanding, this will cause NFS to retry the connection 3 times
> > >(once
> > per second), and then if all 3 are unsuccessful return an error to the
> > application. Is this correct? Is there a risk of data loss or corruption by
> > using "soft" in this way? Or is there a better way to approach this?
>
> I think your best bet is to use an auto-mounter so that the filesystem gets
> unmounted if the server isn't available.
Would this still succeed in unmounting the filesystem if there are already
processes requesting files from it (and blocking in uninterruptable sleep)?

> "soft" always implies the risk of data loss. "Nulls Frequently Substituted"
> as it was described to very many years ago.
>
> Possibly it would be good to have something between 'hard' and 'soft' for
> cases like yours (you aren't the first to ask).
>
> From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm
>
> BSDI and OSF /1 also have a spongy option that is similar to hard , except
> that the stat, lookup, fsstat, readlink, and readdir operations behave
> like a soft MOUNT .
>
> Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed
> experiment and there are good reasons not to want it.

The problem that sparked this question is a webserver where apache can serve
files from an NFS mount. If the NFS server becomes unavailable, then the apache
processes block in uninterruptable sleep and drive the load very high, forcing
a server restart. It would be better for this case if the mount would simply
return an error to apache, so that it would give up rather than blocking
forever and taking down the system. Can such behavior be achieved safely?

2014-03-06 16:45:59

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 11:16 AM, Trond Myklebust <[email protected]> wrote:

>
> On Mar 6, 2014, at 11:13, Chuck Lever <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:
>>>
>>>>
>>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>>>>
>>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>>>>> and not try to write anything to nfs.
>>>>>>
>>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>>>>
>>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>>>>
>>>> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.
>>>
>>> Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit?
>>
>> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me.
>>
>> If anyone else had claimed a testing gap, you would have said ?If that were the case, we would have a blatant read bug? and left it at that. But you had to go one needless and provocative step further.
>>
>> Stop bullying me, Trond. I?ve had enough of it.
>
> The stop spreading FUD. That is far from professional too.

FUD is a marketing term, and implies I had intent to deceive. Really?

I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read ?IMO? where it is implied.

It is absolutely your right to say that I?m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark.

Face it, you over-reacted. Again. Knock it off.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-03-06 20:31:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 14:56, Brian Hawley <[email protected]> wrote:

>
> Given that the systems typically have 16GB's, the memory available for cache is usually around 13GB.
>
> Dirty writeback centisecs is set to 100, as is dirty expire centisecs (we are primarily a sequential access application).
>
> Dirty ratio is 50 and dirty background ratio is 10.

That means you can have up to 8GB to push out in one go. You can hardly blame NFS for being slow in that situation.
Why do you need to cache these writes so aggressively? Is the data being edited and rewritten multiple times in the page cache before you want to push it to disk?

> We set these to try to keep the data from cache always being pushed out.
>
> No oopses. Typically it would be due to an appliance or network connection to it going down. At which point, we want to fail over to an alternative appliance which is serving the same data.
>
> It's unfortunate that when the i/o error is detected that the other packets can't just timeout right away with the i/o error. After all, it's unlikely to come back, and if it does, you've lost that data that was cached. I'd almost rather have all the i/o's that were cached up to the blocked one fail so I know there was a failure of some of the writes preceeding the one that blocked and got the i/o error. This is the price we pay for using "soft" and it is an expected price. Otherwise, we'd use "hard?.

Right, but the RPC layer does not know that these are all writes to the same file, and it can?t be expected to know why the server isn?t replying. For instance, I?ve known a single ?unlink' RPC call to take 17 minutes to complete on a server that had a lot of cleanup to do on that file; during that time, the server was happy to take RPC requests for other files...


> -----Original Message-----
> From: Trond Myklebust <[email protected]>
> Sender: [email protected]
> Date: Thu, 6 Mar 2014 14:47:48
> To: <[email protected]>
> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 14:33, Brian Hawley <[email protected]> wrote:
>
>>
>> We do call fsync at synchronization points.
>>
>> The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error.
>
> How much data are you trying to sync? ?soft? won?t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ?dirty_background_bytes? (and/or ?dirty_bytes?) sysctl so that it starts writeback at an earlier time.
>
> Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs?
>
>> -----Original Message-----
>> From: Trond Myklebust <[email protected]>
>> Date: Thu, 6 Mar 2014 14:26:24
>> To: <[email protected]>
>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>
>>
>> On Mar 6, 2014, at 14:14, Brian Hawley <[email protected]> wrote:
>>
>>>
>>> Trond,
>>>
>>> In this case, it isn't fsync or close that are not getting the i/o error. It is the write().
>>
>> My point is that write() isn?t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk.
>>
>>> And we check the return value of every i/o related command.
>>
>>> We aren't using synchronous because the performance becomes abysmal.
>>>
>>> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write.
>>>
>>> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option.
>>
>> Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that?s true for all storage.
>>
>>> -----Original Message-----
>>> From: Trond Myklebust <[email protected]>
>>> Date: Thu, 6 Mar 2014 14:06:24
>>> To: <[email protected]>
>>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>>
>>>
>>> On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:
>>>
>>>>
>>>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.
>>>
>>> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.
>>>
>>> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...
>>>
>>>>
>>>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>>>>
>>>> In your case though, you're not writing.
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Andrew Martin <[email protected]>
>>>> Date: Thu, 6 Mar 2014 10:43:42
>>>> To: Jim Rees<[email protected]>
>>>> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
>>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and
>>>> timeouts on newer kernels
>>>>
>>>>> From: "Jim Rees" <[email protected]>
>>>>> Andrew Martin wrote:
>>>>>
>>>>>> From: "Jim Rees" <[email protected]>
>>>>>> Given this is apache, I think if I were doing this I'd use
>>>>>> ro,soft,intr,tcp
>>>>>> and not try to write anything to nfs.
>>>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>>>>> apache was attempting to do a write or a read, but it seems that
>>>>> tcp,soft,intr
>>>>> was not sufficient to prevent the problem.
>>>>>
>>>>> I had the impression from your original message that you were not using
>>>>> "soft" and were asking if it's safe to use it. Are you saying that even with
>>>>> the "soft" option the apache gets stuck forever?
>>>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
>>>> when the problem occurred (on several ocassions), so my original question was
>>>> if it would be safe to use a small timeo and retrans values to hopefully
>>>> return I/O errors quickly to the application, rather than blocking forever
>>>> (which causes the high load and inevitable reboot). It sounds like that isn't
>>>> safe, but perhaps there is another way to resolve this problem?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to [email protected]
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> _________________________________
>>> Trond Myklebust
>>> Linux NFS client maintainer, PrimaryData
>>> [email protected]
>>>
>>
>> _________________________________
>> Trond Myklebust
>> Linux NFS client maintainer, PrimaryData
>> [email protected]
>>
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 18:49:11

by Jim Rees

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Andrew Martin wrote:

> From: "Jim Rees" <[email protected]>
> Why would a bunch of blocked apaches cause high load and reboot?
What I believe happens is the apache child processes go to serve
these requests and then block in uninterruptable sleep. Thus, there
are fewer and fewer child processes to handle new incoming requests.
Eventually, apache would normally kill said children (e.g after a
child handles a certain number of requests), but it cannot kill them
because they are in uninterruptable sleep. As more and more incoming
requests are queued (and fewer and fewer child processes are available
to serve the requests), the load climbs.

But Neil says the sleeps should be interruptible, despite what the man page
says.

Trond, as far as you know, should a soft mount be interruptible by SIGINT,
or should it require a SIGKILL?

2014-03-05 20:15:55

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpJbiBteSBleHBlcmllbmNlLCB5b3Ugd29uJ3QgZ2V0IHRoZSBpL28gZXJyb3JzIHJlcG9ydGVk
IGJhY2sgdG8gdGhlIHJlYWQvd3JpdGUvY2xvc2Ugb3BlcmF0aW9ucy4gICBJIGRvbid0IGtub3cg
Zm9yIGNlcnRhaW4sIGJ1dCBJIHN1c3BlY3QgdGhpcyBtYXkgYmUgZHVlIHRvIGNhY2hpbmcgYW5k
IGNodW5raW5nIHRvIHR1cm4gSS9vIG1hdGNoaW5nIHRoZSByc2l6ZS93c2l6ZSBzZXR0aW5nczsg
YW5kIHBvc3NpYmx5IHRoZSBmYWN0IHRoYXQgdGhlIHBlZXIgZGlzY29ubmVjdGlvbiBpc24ndCBu
b3RpY2VkIHVubGVzcyB0aGUgbmZzIHNlcnZlciByZXNldHMgKGllIGNhYmxlIGRpc2Nvbm5lY3Rp
b24gaXNuJ3Qgc3VmZmljaWVudCkuDQoNClRoZSBpbmFiaWxpdHkgdG8gZ2V0IHRoZSBpL28gZXJy
b3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGhhcyBiZWVuIGEgbWFqb3IgcGFpbiBmb3IgdXMu
DQoNCk9uIGEgbGFyayB3ZSBkaWQgZmluZCB0aGF0IHJlcGVhdGVkIHVubW9udCAtZidzIGRvZXMg
Z2V0IGkvbyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGljYXRpb24sIGJ1dCBpc24ndCBvdXIgcHJl
ZmVycmVkIHdheS4NCg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3
IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NClNlbmRlcjogbGludXgtbmZzLW93bmVyQHZn
ZXIua2VybmVsLm9yZw0KRGF0ZTogCVdlZCwgNSBNYXIgMjAxNCAxMTo0NToyNCANClRvOiA8bGlu
dXgtbmZzQHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlv
bnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzDQogb24gbmV3ZXIga2Vy
bmVscw0KDQpIZWxsbywNCg0KSXMgaXQgc2FmZSB0byB1c2UgdGhlICJzb2Z0IiBtb3VudCBvcHRp
b24gd2l0aCBwcm90bz10Y3Agb24gbmV3ZXIga2VybmVscyAoZS5nDQozLjIgYW5kIG5ld2VyKT8g
Q3VycmVudGx5IHVzaW5nIHRoZSAiZGVmYXVsdHMiIG5mcyBtb3VudCBvcHRpb25zIG9uIFVidW50
dQ0KMTIuMDQgcmVzdWx0cyBpbiBwcm9jZXNzZXMgYmxvY2tpbmcgZm9yZXZlciBpbiB1bmludGVy
cnVwdGFibGUgc2xlZXAgaWYgdGhleQ0KYXR0ZW1wdCB0byBhY2Nlc3MgYSBtb3VudHBvaW50IHdo
aWxlIHRoZSBORlMgc2VydmVyIGlzIG9mZmxpbmUuIEkgd291bGQgcHJlZmVyDQp0aGF0IE5GUyBz
aW1wbHkgcmV0dXJuIGFuIGVycm9yIHRvIHRoZSBjbGllbnRzIGFmdGVyIHJldHJ5aW5nIGEgZmV3
IHRpbWVzLCANCmhvd2V2ZXIgSSBhbHNvIGNhbm5vdCBoYXZlIGRhdGEgbG9zcy4gRnJvbSB0aGUg
bWFuIHBhZ2UsIEkgdGhpbmsgdGhlc2Ugb3B0aW9ucw0Kd2lsbCBnaXZlIHRoYXQgZWZmZWN0Pw0K
c29mdCxwcm90bz10Y3AsdGltZW89MTAscmV0cmFucz0zDQoNCj5Gcm9tIG15IHVuZGVyc3RhbmRp
bmcsIHRoaXMgd2lsbCBjYXVzZSBORlMgdG8gcmV0cnkgdGhlIGNvbm5lY3Rpb24gMyB0aW1lcyAo
b25jZQ0KcGVyIHNlY29uZCksIGFuZCB0aGVuIGlmIGFsbCAzIGFyZSB1bnN1Y2Nlc3NmdWwgcmV0
dXJuIGFuIGVycm9yIHRvIHRoZQ0KYXBwbGljYXRpb24uIElzIHRoaXMgY29ycmVjdD8gSXMgdGhl
cmUgYSByaXNrIG9mIGRhdGEgbG9zcyBvciBjb3JydXB0aW9uIGJ5DQp1c2luZyAic29mdCIgaW4g
dGhpcyB3YXk/IE9yIGlzIHRoZXJlIGEgYmV0dGVyIHdheSB0byBhcHByb2FjaCB0aGlzPw0KDQpU
aGFua3MsDQoNCkFuZHJldyBNYXJ0aW4NCi0tDQpUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlz
dDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KdGhlIGJvZHkgb2Yg
YSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCk1vcmUgbWFqb3Jkb21vIGlu
Zm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQo=


2014-03-06 19:52:39

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 14:46, Andrew Martin <[email protected]> wrote:

>> From: "Trond Myklebust" <[email protected]>
>> On Mar 6, 2014, at 13:35, Andrew Martin <[email protected]> wrote:
>>
>>>> From: "Jim Rees" <[email protected]>
>>>> Why would a bunch of blocked apaches cause high load and reboot?
>>> What I believe happens is the apache child processes go to serve
>>> these requests and then block in uninterruptable sleep. Thus, there
>>> are fewer and fewer child processes to handle new incoming requests.
>>> Eventually, apache would normally kill said children (e.g after a
>>> child handles a certain number of requests), but it cannot kill them
>>> because they are in uninterruptable sleep. As more and more incoming
>>> requests are queued (and fewer and fewer child processes are available
>>> to serve the requests), the load climbs.
>>
>> Does ?top? support this theory? Presumably you should see a handful of
>> non-sleeping apache threads dominating the load when it happens.
> Yes, it looks like the root apache process is still running:
> root 1773 0.0 0.1 244176 16588 ? Ss Feb18 0:42 /usr/sbin/apache2 -k start
>
> All of the others, the children (running as the www-data user), are marked as D.
>
>> Why is the server becoming ?unavailable? in the first place? Are you taking
>> it down?
> I do not know the answer to this. A single NFS server has an export that is
> mounted on multiple servers, including this web server. The web server is
> running Ubuntu 10.04 LTS 2.6.32-57 with nfs-common 1.2.0. Intermittently, the
> NFS mountpoint will become inaccessible on this web server; processes that
> attempt to access it will block in uninterruptable sleep. While this is
> occurring, the NFS export is still accessible normally from other clients,
> so it appears to be related to this particular machine (probably since it is
> the last machine running Ubuntu 10.04 and not 12.04). I do not know if this
> is a bug in 2.6.32 or another package on the system, but at this time I
> cannot upgrade it to 12.04, so I need to find a solution on 10.04.
>
> I attempted to get a backtrace from one of the uninterruptable apache processes:
> echo w > /proc/sysrq-trigger
>
> Here's one example:
> [1227348.003904] apache2 D 0000000000000000 0 10175 1773 0x00000004
> [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 0000000000015e00
> [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 ffff8801d88f0000
> [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 ffff8801d88f03d0
> [1227348.003912] Call Trace:
> [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 [sunrpc]
> [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90
> [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc]
> [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90
> [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40
> [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc]
> [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc]

That basically means that the process is hanging in the RPC layer, somewhere in the state machine. ?echo 0 >/proc/sys/sunrpc/rpc_debug? as the ?root? user should give us a dump of which state these RPC calls are in. Can you please try that?

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 16:02:49

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:

>
> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>
>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>> and not try to write anything to nfs.
>>>
>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>
>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>
> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.

Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit...

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 20:41:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 15:34, Brian Hawley <[email protected]> wrote:

>
> We're not intending to aggressively cache. There just happens to be a lot of free memory.
>

I?d suggest tuning down the ?dirty_ratio? to a smaller value. Unless you need to rewrite it, you really are better off pushing the data to storage a little sooner.

Then, as I said, try the ?echo 0 >/proc/sys/sunrpc/rpc_debug? during one of these hangs in order to find out where the RPC calls are waiting. Also, run that ?netstat -tn? to see that there TCP connection to port 2049 on the server is up, and that there are free TCP ports in the range 665-1023.

>
> -----Original Message-----
> From: Trond Myklebust <[email protected]>
> Sender: [email protected]
> Date: Thu, 6 Mar 2014 15:31:33
> To: <[email protected]>
> Cc: <[email protected]>; Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 14:56, Brian Hawley <[email protected]> wrote:
>
>>
>> Given that the systems typically have 16GB's, the memory available for cache is usually around 13GB.
>>
>> Dirty writeback centisecs is set to 100, as is dirty expire centisecs (we are primarily a sequential access application).
>>
>> Dirty ratio is 50 and dirty background ratio is 10.
>
> That means you can have up to 8GB to push out in one go. You can hardly blame NFS for being slow in that situation.
> Why do you need to cache these writes so aggressively? Is the data being edited and rewritten multiple times in the page cache before you want to push it to disk?
>
>> We set these to try to keep the data from cache always being pushed out.
>>
>> No oopses. Typically it would be due to an appliance or network connection to it going down. At which point, we want to fail over to an alternative appliance which is serving the same data.
>>
>> It's unfortunate that when the i/o error is detected that the other packets can't just timeout right away with the i/o error. After all, it's unlikely to come back, and if it does, you've lost that data that was cached. I'd almost rather have all the i/o's that were cached up to the blocked one fail so I know there was a failure of some of the writes preceeding the one that blocked and got the i/o error. This is the price we pay for using "soft" and it is an expected price. Otherwise, we'd use "hard?.
>
> Right, but the RPC layer does not know that these are all writes to the same file, and it can?t be expected to know why the server isn?t replying. For instance, I?ve known a single ?unlink' RPC call to take 17 minutes to complete on a server that had a lot of cleanup to do on that file; during that time, the server was happy to take RPC requests for other files...
>
>
>> -----Original Message-----
>> From: Trond Myklebust <[email protected]>
>> Sender: [email protected]
>> Date: Thu, 6 Mar 2014 14:47:48
>> To: <[email protected]>
>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>
>>
>> On Mar 6, 2014, at 14:33, Brian Hawley <[email protected]> wrote:
>>
>>>
>>> We do call fsync at synchronization points.
>>>
>>> The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error.
>>
>> How much data are you trying to sync? ?soft? won?t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ?dirty_background_bytes? (and/or ?dirty_bytes?) sysctl so that it starts writeback at an earlier time.
>>
>> Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs?
>>
>>> -----Original Message-----
>>> From: Trond Myklebust <[email protected]>
>>> Date: Thu, 6 Mar 2014 14:26:24
>>> To: <[email protected]>
>>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>>
>>>
>>> On Mar 6, 2014, at 14:14, Brian Hawley <[email protected]> wrote:
>>>
>>>>
>>>> Trond,
>>>>
>>>> In this case, it isn't fsync or close that are not getting the i/o error. It is the write().
>>>
>>> My point is that write() isn?t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk.
>>>
>>>> And we check the return value of every i/o related command.
>>>
>>>> We aren't using synchronous because the performance becomes abysmal.
>>>>
>>>> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write.
>>>>
>>>> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option.
>>>
>>> Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that?s true for all storage.
>>>
>>>> -----Original Message-----
>>>> From: Trond Myklebust <[email protected]>
>>>> Date: Thu, 6 Mar 2014 14:06:24
>>>> To: <[email protected]>
>>>> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
>>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>>>>
>>>>
>>>> On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:
>>>>
>>>>>
>>>>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.
>>>>
>>>> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.
>>>>
>>>> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...
>>>>
>>>>>
>>>>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>>>>>
>>>>> In your case though, you're not writing.
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Andrew Martin <[email protected]>
>>>>> Date: Thu, 6 Mar 2014 10:43:42
>>>>> To: Jim Rees<[email protected]>
>>>>> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
>>>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and
>>>>> timeouts on newer kernels
>>>>>
>>>>>> From: "Jim Rees" <[email protected]>
>>>>>> Andrew Martin wrote:
>>>>>>
>>>>>>> From: "Jim Rees" <[email protected]>
>>>>>>> Given this is apache, I think if I were doing this I'd use
>>>>>>> ro,soft,intr,tcp
>>>>>>> and not try to write anything to nfs.
>>>>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>>>>>> apache was attempting to do a write or a read, but it seems that
>>>>>> tcp,soft,intr
>>>>>> was not sufficient to prevent the problem.
>>>>>>
>>>>>> I had the impression from your original message that you were not using
>>>>>> "soft" and were asking if it's safe to use it. Are you saying that even with
>>>>>> the "soft" option the apache gets stuck forever?
>>>>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
>>>>> when the problem occurred (on several ocassions), so my original question was
>>>>> if it would be safe to use a small timeo and retrans values to hopefully
>>>>> return I/O errors quickly to the application, rather than blocking forever
>>>>> (which causes the high load and inevitable reboot). It sounds like that isn't
>>>>> safe, but perhaps there is another way to resolve this problem?
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to [email protected]
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>>>> _________________________________
>>>> Trond Myklebust
>>>> Linux NFS client maintainer, PrimaryData
>>>> [email protected]
>>>>
>>>
>>> _________________________________
>>> Trond Myklebust
>>> Linux NFS client maintainer, PrimaryData
>>> [email protected]
>>>
>>
>> _________________________________
>> Trond Myklebust
>> Linux NFS client maintainer, PrimaryData
>> [email protected]
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 15:26:39

by Chuck Lever III

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:

> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
> and not try to write anything to nfs.

I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.

Skip ?intr? though, it really is a no-op after 2.6.25.

If your workload is really ONLY reading files that don?t change often, you might consider ?ro,soft,vers=3,nocto?.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




2014-03-06 03:50:51

by NeilBrown

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <[email protected]>
wrote:

> Hello,
>
> Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g
> 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu
> 12.04 results in processes blocking forever in uninterruptable sleep if they
> attempt to access a mountpoint while the NFS server is offline. I would prefer
> that NFS simply return an error to the clients after retrying a few times,
> however I also cannot have data loss. From the man page, I think these options
> will give that effect?
> soft,proto=tcp,timeo=10,retrans=3
>
> >From my understanding, this will cause NFS to retry the connection 3 times (once
> per second), and then if all 3 are unsuccessful return an error to the
> application. Is this correct? Is there a risk of data loss or corruption by
> using "soft" in this way? Or is there a better way to approach this?

I think your best bet is to use an auto-mounter so that the filesystem gets
unmounted if the server isn't available.
"soft" always implies the risk of data loss. "Nulls Frequently Substituted"
as it was described to very many years ago.

Possibly it would be good to have something between 'hard' and 'soft' for
cases like yours (you aren't the first to ask).

From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm

BSDI and OSF /1 also have a spongy option that is similar to hard , except
that the stat, lookup, fsstat, readlink, and readdir operations behave like a soft MOUNT .

Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed
experiment and there are good reasons not to want it.

NeilBrown


Attachments:
signature.asc (828.00 B)

2014-03-06 19:56:35

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpHaXZlbiB0aGF0IHRoZSBzeXN0ZW1zIHR5cGljYWxseSBoYXZlIDE2R0IncywgdGhlIG1lbW9y
eSBhdmFpbGFibGUgZm9yIGNhY2hlIGlzIHVzdWFsbHkgYXJvdW5kIDEzR0IuDQoNCkRpcnR5IHdy
aXRlYmFjayBjZW50aXNlY3MgaXMgc2V0IHRvIDEwMCwgYXMgaXMgZGlydHkgZXhwaXJlIGNlbnRp
c2VjcyAod2UgYXJlIHByaW1hcmlseSBhIHNlcXVlbnRpYWwgYWNjZXNzIGFwcGxpY2F0aW9uKS4N
Cg0KRGlydHkgcmF0aW8gaXMgNTAgYW5kIGRpcnR5IGJhY2tncm91bmQgcmF0aW8gaXMgMTAuIA0K
DQpXZSBzZXQgdGhlc2UgdG8gdHJ5IHRvIGtlZXAgdGhlIGRhdGEgZnJvbSBjYWNoZSBhbHdheXMg
YmVpbmcgcHVzaGVkIG91dC4NCg0KTm8gb29wc2VzLiAgIFR5cGljYWxseSBpdCB3b3VsZCBiZSBk
dWUgdG8gYW4gYXBwbGlhbmNlIG9yIG5ldHdvcmsgY29ubmVjdGlvbiB0byBpdCBnb2luZyBkb3du
LiAgQXQgd2hpY2ggcG9pbnQsIHdlIHdhbnQgdG8gZmFpbCBvdmVyIHRvIGFuIGFsdGVybmF0aXZl
IGFwcGxpYW5jZSB3aGljaCBpcyBzZXJ2aW5nIHRoZSBzYW1lIGRhdGEuICAgIA0KDQpJdCdzIHVu
Zm9ydHVuYXRlIHRoYXQgd2hlbiB0aGUgaS9vIGVycm9yIGlzIGRldGVjdGVkIHRoYXQgdGhlIG90
aGVyIHBhY2tldHMgY2FuJ3QganVzdCB0aW1lb3V0IHJpZ2h0IGF3YXkgd2l0aCB0aGUgaS9vIGVy
cm9yLiAgIEFmdGVyIGFsbCwgaXQncyB1bmxpa2VseSB0byBjb21lIGJhY2ssIGFuZCBpZiBpdCBk
b2VzLCB5b3UndmUgbG9zdCB0aGF0IGRhdGEgdGhhdCB3YXMgY2FjaGVkLiAgSSdkIGFsbW9zdCBy
YXRoZXIgaGF2ZSBhbGwgdGhlIGkvbydzIHRoYXQgd2VyZSBjYWNoZWQgdXAgdG8gdGhlIGJsb2Nr
ZWQgb25lIGZhaWwgc28gSSBrbm93IHRoZXJlIHdhcyBhIGZhaWx1cmUgb2Ygc29tZSBvZiB0aGUg
d3JpdGVzIHByZWNlZWRpbmcgdGhlIG9uZSB0aGF0IGJsb2NrZWQgYW5kIGdvdCB0aGUgaS9vIGVy
cm9yLiAgICBUaGlzIGlzIHRoZSBwcmljZSB3ZSBwYXkgZm9yIHVzaW5nICJzb2Z0IiBhbmQgaXQg
aXMgYW4gZXhwZWN0ZWQgcHJpY2UuICAgT3RoZXJ3aXNlLCB3ZSdkIHVzZSAiaGFyZCIuDQoNCg0K
DQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRy
b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQpTZW5kZXI6IGxpbnV4LW5mcy1vd25lckB2
Z2VyLmtlcm5lbC5vcmcNCkRhdGU6CVRodSwgNiBNYXIgMjAxNCAxNDo0Nzo0OCANClRvOiA8Ymhh
d2xleUBsdW1pbmV4LmNvbT4NCkNjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+
OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs
aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v
cmc+DQpTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxs
b3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2
LCAyMDE0LCBhdCAxNDozMywgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90
ZToNCg0KPiANCj4gV2UgZG8gY2FsbCBmc3luYyBhdCBzeW5jaHJvbml6YXRpb24gcG9pbnRzLg0K
PiANCj4gVGhlIHByb2JsZW0gaXMgdGhlIHdyaXRlKCkgYmxvY2tzIGZvcmV2ZXIgKG9yIGZvciBh
biBleGNlcHRpb25hbGx5IGxvbmcgdGltZSBvbiB0aGUgb3JkZXIgb2YgaG91cnMgYW5kIGRheXMp
LCBldmVuIHdpdGggdGltZW8gc2V0IHRvIHNheSAyMCBhbmQgcmV0cmFucyBzZXQgdG8gMi4gIFdl
IHNlZSB0aW1lb3V0IG1lc3NhZ2VzIGluIC92YXIvbG9nL21lc3NhZ2VzLCBidXQgdGhlIHdyaXRl
IGNvbnRpbnVlcyB0byBwZW5kLiAgIFVudGlsIHdlIHN0YXJ0IGRvaW5nIHJlcGVhdGVkIHVtb3Vu
dCAtZidzLiAgVGhlbiBpdCByZXR1cm5zIGFuZCBoYXMgYW4gaS9vIGVycm9yLg0KDQpIb3cgbXVj
aCBkYXRhIGFyZSB5b3UgdHJ5aW5nIHRvIHN5bmM/IJFzb2Z0kiB3b26SdCB0aW1lIG91dCB0aGUg
ZW50aXJlIGJhdGNoIGF0IG9uY2UuIEl0IGZlZWRzIGVhY2ggd3JpdGUgUlBDIGNhbGwgdGhyb3Vn
aCwgYW5kIGxldHMgaXQgdGltZSBvdXQuIFNvIGlmIHlvdSBoYXZlIGNhY2hlZCBhIGh1Z2UgYW1v
dW50IG9mIHdyaXRlcywgdGhlbiB0aGF0IGNhbiB0YWtlIGEgd2hpbGUuIFRoZSBzb2x1dGlvbiBp
cyB0byBwbGF5IHdpdGggdGhlIJFkaXJ0eV9iYWNrZ3JvdW5kX2J5dGVzkiAoYW5kL29yIJFkaXJ0
eV9ieXRlc5IpIHN5c2N0bCBzbyB0aGF0IGl0IHN0YXJ0cyB3cml0ZWJhY2sgYXQgYW4gZWFybGll
ciB0aW1lLg0KDQpBbHNvLCB3aGF0IGlzIHRoZSBjYXVzZSBvZiB0aGVzZSBzdGFsbHMgaW4gdGhl
IGZpcnN0IHBsYWNlPyBJcyB0aGUgVENQIGNvbm5lY3Rpb24gdG8gdGhlIHNlcnZlciBzdGlsbCB1
cD8gQXJlIGFueSBPb3BzZXMgcHJlc2VudCBpbiBlaXRoZXIgdGhlIGNsaWVudCBvciB0aGUgc2Vy
dmVyIHN5c2xvZ3M/DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogVHJv
bmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KPiBEYXRlOiBU
aHUsIDYgTWFyIDIwMTQgMTQ6MjY6MjQgDQo+IFRvOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT4NCj4g
Q2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1p
Y2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2Vy
Lmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4gU3ViamVjdDogUmU6
IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5k
IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4gDQo+IA0KPiBPbiBNYXIgNiwgMjAxNCwgYXQg
MTQ6MTQsIEJyaWFuIEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6DQo+IA0KPj4g
DQo+PiBUcm9uZCwNCj4+IA0KPj4gSW4gdGhpcyBjYXNlLCBpdCBpc24ndCBmc3luYyBvciBjbG9z
ZSB0aGF0IGFyZSBub3QgZ2V0dGluZyB0aGUgaS9vIGVycm9yLiAgSXQgaXMgdGhlIHdyaXRlKCku
ICANCj4gDQo+IE15IHBvaW50IGlzIHRoYXQgd3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRv
IHJldHVybiBhbiBlcnJvciBpbiB0aGUgY2FzZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5h
dmFpbGFibGUuIFVubGVzcyB5b3UgdXNlIE9fU1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4g
dGhlIGtlcm5lbCBpcyBlbnRpdGxlZCBhbmQgaW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBk
YXRhIGluIGl0cyBwYWdlIGNhY2hlIHVudGlsIHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4g
VGhlIHJldHVybiB2YWx1ZSBvZiB0aGF0IGZzeW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3
aGV0aGVyIG9yIG5vdCB5b3VyIGRhdGEgaGFzIHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0K
PiANCj4+IEFuZCB3ZSBjaGVjayB0aGUgcmV0dXJuIHZhbHVlIG9mIGV2ZXJ5IGkvbyByZWxhdGVk
IGNvbW1hbmQuDQo+IA0KPj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhl
IHBlcmZvcm1hbmNlIGJlY29tZXMgYWJ5c21hbC4NCj4+IA0KPj4gUmVwZWF0ZWQgdW1vdW50IC1m
IGRvZXMgZXZlbnR1YWxseSByZXN1bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0
ZWQgYmFjayB0byB0aGUgd3JpdGUoKSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1v
dW50IC1mJ3MgYXJlIHdvcmtpbmcgdGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNo
ZS9xdWV1ZSBhbmQgZXZlbnR1YWxseSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4g
ICAgDQo+PiANCj4+IEFzIEkgbWVudGlvbmVkIHByZXZpb3VzbHksIGlmIHdlIG1vdW50IHdpdGgg
c3luYyBvciBkaXJlY3QgaS9vIHR5cGUgb3B0aW9ucywgd2Ugd2lsbCBnZXQgdGhlIGkvbyBlcnJv
ciwgYnV0IGZvciBwZXJmb3JtYW5jZSByZWFzb25zLCB0aGlzIGlzbid0IGFuIG9wdGlvbi4NCj4g
DQo+IFN1cmUsIGJ1dCBpbiB0aGF0IGNhc2UgeW91IGRvIG5lZWQgdG8gY2FsbCBmc3luYygpIGJl
Zm9yZSB0aGUgYXBwbGljYXRpb24gZXhpdHMuIE5vdGhpbmcgZWxzZSBjYW4gZ3VhcmFudGVlIGRh
dGEgc3RhYmlsaXR5LCBhbmQgdGhhdJJzIHRydWUgZm9yIGFsbCBzdG9yYWdlLg0KPiANCj4+IC0t
LS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+PiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25k
Lm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQo+PiBEYXRlOiBUaHUsIDYgTWFyIDIwMTQgMTQ6
MDY6MjQgDQo+PiBUbzogPGJoYXdsZXlAbHVtaW5leC5jb20+DQo+PiBDYzogQW5kcmV3IE1hcnRp
bjxhbWFydGluQHhlcy1pbmMuY29tPjsgSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+OyBCcm93biBO
ZWlsPG5laWxiQHN1c2UuZGU+OyA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IDxs
aW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KPj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1v
dW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5l
d2VyIGtlcm5lbHMNCj4+IA0KPj4gDQo+PiBPbiBNYXIgNiwgMjAxNCwgYXQgMTQ6MDAsIEJyaWFu
IEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6DQo+PiANCj4+PiANCj4+PiBFdmVu
IHdpdGggc21hbGwgdGltZW8gYW5kIHJldHJhbnMsIHlvdSB3b24ndCBnZXQgaS9vIGVycm9ycyBi
YWNrIHRvIHRoZSByZWFkcy93cml0ZXMuICAgVGhhdCdzIGJlZW4gb3VyIGV4cGVyaWVuY2UgYW55
d2F5Lg0KPj4gDQo+PiBSZWFkIGNhY2hpbmcsIGFuZCBidWZmZXJlZCB3cml0ZXMgbWVhbiB0aGF0
IHRoZSBJL08gZXJyb3JzIG9mdGVuIGRvIG5vdCBvY2N1ciBkdXJpbmcgdGhlIHJlYWQoKS93cml0
ZSgpIHN5c3RlbSBjYWxsIGl0c2VsZi4NCj4+IA0KPj4gV2UgZG8gdHJ5IHRvIHByb3BhZ2F0ZSBJ
L08gZXJyb3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGFzIHNvb24gYXMgdGhlIGRvIG9jY3Vy
LCBidXQgaWYgdGhhdCBhcHBsaWNhdGlvbiBpc26SdCB1c2luZyBzeW5jaHJvbm91cyBJL08sIGFu
ZCBpdCBpc26SdCBjaGVja2luZyB0aGUgcmV0dXJuIHZhbHVlcyBvZiBmc3luYygpIG9yIGNsb3Nl
KCksIHRoZW4gdGhlcmUgaXMgbGl0dGxlIHRoZSBrZXJuZWwgY2FuIGRvLi4uDQo+PiANCj4+PiAN
Cj4+PiBXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRoYXQg
aGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0aGUg
c3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9vIGlm
IGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2UgbmV2
ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPj4+IA0K
Pj4+IEluIHlvdXIgY2FzZSB0aG91Z2gsIHlvdSdyZSBub3Qgd3JpdGluZy4gIA0KPj4+IA0KPj4+
IA0KPj4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+Pj4gRnJvbTogQW5kcmV3IE1hcnRp
biA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4+PiBEYXRlOiBUaHUsIDYgTWFyIDIwMTQgMTA6NDM6
NDIgDQo+Pj4gVG86IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1Pg0KPj4+IENjOiA8Ymhhd2xleUBs
dW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2
Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4+PiBTdWJqZWN0
OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0
cyBhbmQNCj4+PiB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+Pj4gDQo+Pj4+IEZyb206ICJK
aW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+PiBBbmRyZXcgTWFydGluIHdyb3RlOg0KPj4+
PiANCj4+Pj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+Pj4gR2l2ZW4g
dGhpcyBpcyBhcGFjaGUsIEkgdGhpbmsgaWYgSSB3ZXJlIGRvaW5nIHRoaXMgSSdkIHVzZQ0KPj4+
Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+Pj4gYW5kIG5vdCB0cnkgdG8gd3JpdGUgYW55dGhpbmcg
dG8gbmZzLg0KPj4+PiBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9i
bGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+Pj4+IGFwYWNoZSB3YXMgYXR0ZW1wdGlu
ZyB0byBkbyBhIHdyaXRlIG9yIGEgcmVhZCwgYnV0IGl0IHNlZW1zIHRoYXQNCj4+Pj4gdGNwLHNv
ZnQsaW50cg0KPj4+PiB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4N
Cj4+Pj4gDQo+Pj4+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNz
YWdlIHRoYXQgeW91IHdlcmUgbm90IHVzaW5nDQo+Pj4+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcg
aWYgaXQncyBzYWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4+
Pj4gdGhlICJzb2Z0IiBvcHRpb24gdGhlIGFwYWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQo+Pj4g
WWVzLCBldmVuIHdpdGggc29mdCwgaXQgZ2V0cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVz
aW5nIHRjcCxiZyxzb2Z0LGludHINCj4+PiB3aGVuIHRoZSBwcm9ibGVtIG9jY3VycmVkIChvbiBz
ZXZlcmFsIG9jYXNzaW9ucyksIHNvIG15IG9yaWdpbmFsIHF1ZXN0aW9uIHdhcw0KPj4+IGlmIGl0
IHdvdWxkIGJlIHNhZmUgdG8gdXNlIGEgc21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRv
IGhvcGVmdWxseSANCj4+PiByZXR1cm4gSS9PIGVycm9ycyBxdWlja2x5IHRvIHRoZSBhcHBsaWNh
dGlvbiwgcmF0aGVyIHRoYW4gYmxvY2tpbmcgZm9yZXZlciANCj4+PiAod2hpY2ggY2F1c2VzIHRo
ZSBoaWdoIGxvYWQgYW5kIGluZXZpdGFibGUgcmVib290KS4gSXQgc291bmRzIGxpa2UgdGhhdCBp
c24ndA0KPj4+IHNhZmUsIGJ1dCBwZXJoYXBzIHRoZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29s
dmUgdGhpcyBwcm9ibGVtPw0KPj4+IC0tDQo+Pj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxp
c3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4+PiB0aGUgYm9k
eSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPj4+IE1vcmUgbWFq
b3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRt
bA0KPj4+IA0KPj4gDQo+PiBfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCj4+IFRy
b25kIE15a2xlYnVzdA0KPj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0
YQ0KPj4gdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0KPj4gDQo+IA0KPiBfX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gVHJvbmQgTXlrbGVidXN0DQo+IExpbnV4IE5G
UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4gdHJvbmQubXlrbGVidXN0QHByaW1h
cnlkYXRhLmNvbQ0KPiANCg0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpUcm9u
ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRy
b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg0KLS0NClRvIHVuc3Vic2NyaWJlIGZyb20g
dGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1uZnMiIGluDQp0aGUg
Ym9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KTW9yZSBtYWpv
cmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1s
DQoNCg==


2014-03-06 16:16:32

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 11:13, Chuck Lever <[email protected]> wrote:

>
> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Mar 6, 2014, at 10:59, Chuck Lever <[email protected]> wrote:
>>
>>>
>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <[email protected]> wrote:
>>>
>>>>
>>>> On Mar 6, 2014, at 10:26, Chuck Lever <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <[email protected]> wrote:
>>>>>
>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp
>>>>>> and not try to write anything to nfs.
>>>>>
>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with ?ro,soft" is that an interrupted read would cause the client to cache incomplete data.
>>>>
>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date.
>>>
>>> Agree, the design is sound. But we don?t test this use case very much, so I don?t have 100% confidence that there are no bugs.
>>
>> Is that the royal ?we?, or are you talking on behalf of all the QA departments and testers here? I call bullshit?
>
> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me.
>
> If anyone else had claimed a testing gap, you would have said ?If that were the case, we would have a blatant read bug? and left it at that. But you had to go one needless and provocative step further.
>
> Stop bullying me, Trond. I?ve had enough of it.

The stop spreading FUD. That is far from professional too...

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-05 20:41:31

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

----- Original Message -----
> From: "Jim Rees" <[email protected]>
> To: "Andrew Martin" <[email protected]>
> Cc: [email protected]
> Sent: Wednesday, March 5, 2014 2:11:49 PM
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
> I prefer hard,intr which lets you interrupt the hung process.
>
Isn't intr/nointr deprecated (since kernel 2.6.25)?

2014-03-06 03:35:04

by NeilBrown

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <[email protected]> wrote:

> Andrew Martin wrote:
>
> Isn't intr/nointr deprecated (since kernel 2.6.25)?
>
> It isn't so much that it's deprecated as that it's now the default (except
> that only SIGKILL will work).

Not quite correct. Any signal will work providing its behaviour is to kill
the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc
will work providing that aren't caught or ignored by the process.

NeilBrown


> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


Attachments:
signature.asc (828.00 B)

2014-03-06 18:26:59

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 12:36, Jim Rees <[email protected]> wrote:

> Why would a bunch of blocked apaches cause high load and reboot?

Good question. Are the TCP reconnect attempts perhaps eating up all the reserved ports and leaving them in the TIME_WAIT state? ?netstat -tn? should list all the ports currently in use by TCP connections.

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 19:26:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 14:14, Brian Hawley <[email protected]> wrote:

>
> Trond,
>
> In this case, it isn't fsync or close that are not getting the i/o error. It is the write().

My point is that write() isn?t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk.

> And we check the return value of every i/o related command.

> We aren't using synchronous because the performance becomes abysmal.
>
> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write.
>
> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option.

Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that?s true for all storage.

> -----Original Message-----
> From: Trond Myklebust <[email protected]>
> Date: Thu, 6 Mar 2014 14:06:24
> To: <[email protected]>
> Cc: Andrew Martin<[email protected]>; Jim Rees<[email protected]>; Brown Neil<[email protected]>; <[email protected]>; <[email protected]>
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> On Mar 6, 2014, at 14:00, Brian Hawley <[email protected]> wrote:
>
>>
>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway.
>
> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself.
>
> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn?t using synchronous I/O, and it isn?t checking the return values of fsync() or close(), then there is little the kernel can do...
>
>>
>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written.
>>
>> In your case though, you're not writing.
>>
>>
>> -----Original Message-----
>> From: Andrew Martin <[email protected]>
>> Date: Thu, 6 Mar 2014 10:43:42
>> To: Jim Rees<[email protected]>
>> Cc: <[email protected]>; NeilBrown<[email protected]>; <[email protected]>; <[email protected]>
>> Subject: Re: Optimal NFS mount options to safely allow interrupts and
>> timeouts on newer kernels
>>
>>> From: "Jim Rees" <[email protected]>
>>> Andrew Martin wrote:
>>>
>>>> From: "Jim Rees" <[email protected]>
>>>> Given this is apache, I think if I were doing this I'd use
>>>> ro,soft,intr,tcp
>>>> and not try to write anything to nfs.
>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if
>>> apache was attempting to do a write or a read, but it seems that
>>> tcp,soft,intr
>>> was not sufficient to prevent the problem.
>>>
>>> I had the impression from your original message that you were not using
>>> "soft" and were asking if it's safe to use it. Are you saying that even with
>>> the "soft" option the apache gets stuck forever?
>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr
>> when the problem occurred (on several ocassions), so my original question was
>> if it would be safe to use a small timeo and retrans values to hopefully
>> return I/O errors quickly to the application, rather than blocking forever
>> (which causes the high load and inevitable reboot). It sounds like that isn't
>> safe, but perhaps there is another way to resolve this problem?
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> _________________________________
> Trond Myklebust
> Linux NFS client maintainer, PrimaryData
> [email protected]
>

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-03-06 19:00:22

by Brian Hawley

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

DQpFdmVuIHdpdGggc21hbGwgdGltZW8gYW5kIHJldHJhbnMsIHlvdSB3b24ndCBnZXQgaS9vIGVy
cm9ycyBiYWNrIHRvIHRoZSByZWFkcy93cml0ZXMuICAgVGhhdCdzIGJlZW4gb3VyIGV4cGVyaWVu
Y2UgYW55d2F5Lg0KDQpXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChk
YXRhIHRoYXQgaGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHll
dCB0byB0aGUgc3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFy
ZCcgdG9vIGlmIGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBs
aWFuY2UgbmV2ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVu
Lg0KDQpJbiB5b3VyIGNhc2UgdGhvdWdoLCB5b3UncmUgbm90IHdyaXRpbmcuICANCg0KDQotLS0t
LU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMt
aW5jLmNvbT4NCkRhdGU6IFRodSwgNiBNYXIgMjAxNCAxMDo0Mzo0MiANClRvOiBKaW0gUmVlczxy
ZWVzQHVtaWNoLmVkdT4NCkNjOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWls
YkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZz
QHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25z
IHRvIHNhZmVseSBhbGxvdyBpbnRlcnJ1cHRzIGFuZA0KIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5l
bHMNCg0KPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4gQW5kcmV3IE1hcnRp
biB3cm90ZToNCj4gDQo+ICAgPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4g
ICA+IEdpdmVuIHRoaXMgaXMgYXBhY2hlLCBJIHRoaW5rIGlmIEkgd2VyZSBkb2luZyB0aGlzIEkn
ZCB1c2UNCj4gICA+IHJvLHNvZnQsaW50cix0Y3ANCj4gICA+IGFuZCBub3QgdHJ5IHRvIHdyaXRl
IGFueXRoaW5nIHRvIG5mcy4NCj4gICBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4g
dGhpcyBwcm9ibGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+ICAgYXBhY2hlIHdhcyBh
dHRlbXB0aW5nIHRvIGRvIGEgd3JpdGUgb3IgYSByZWFkLCBidXQgaXQgc2VlbXMgdGhhdA0KPiAg
IHRjcCxzb2Z0LGludHINCj4gICB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJv
YmxlbS4NCj4gDQo+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNz
YWdlIHRoYXQgeW91IHdlcmUgbm90IHVzaW5nDQo+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcgaWYg
aXQncyBzYWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4gdGhl
ICJzb2Z0IiBvcHRpb24gdGhlIGFwYWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQpZZXMsIGV2ZW4g
d2l0aCBzb2Z0LCBpdCBnZXRzIHN0dWNrIGZvcmV2ZXIuIEkgaGFkIGJlZW4gdXNpbmcgdGNwLGJn
LHNvZnQsaW50cg0Kd2hlbiB0aGUgcHJvYmxlbSBvY2N1cnJlZCAob24gc2V2ZXJhbCBvY2Fzc2lv
bnMpLCBzbyBteSBvcmlnaW5hbCBxdWVzdGlvbiB3YXMNCmlmIGl0IHdvdWxkIGJlIHNhZmUgdG8g
dXNlIGEgc21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRvIGhvcGVmdWxseSANCnJldHVy
biBJL08gZXJyb3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBibG9j
a2luZyBmb3JldmVyIA0KKHdoaWNoIGNhdXNlcyB0aGUgaGlnaCBsb2FkIGFuZCBpbmV2aXRhYmxl
IHJlYm9vdCkuIEl0IHNvdW5kcyBsaWtlIHRoYXQgaXNuJ3QNCnNhZmUsIGJ1dCBwZXJoYXBzIHRo
ZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29sdmUgdGhpcyBwcm9ibGVtPw0KDQo=


2014-03-06 21:01:06

by Trond Myklebust

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels


On Mar 6, 2014, at 15:45, Andrew Martin <[email protected]> wrote:

> ----- Original Message -----
>> From: "Trond Myklebust" <[email protected]>
>>> I attempted to get a backtrace from one of the uninterruptable apache
>>> processes:
>>> echo w > /proc/sysrq-trigger
>>>
>>> Here's one example:
>>> [1227348.003904] apache2 D 0000000000000000 0 10175 1773
>>> 0x00000004
>>> [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00
>>> 0000000000015e00
>>> [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00
>>> ffff8801d88f0000
>>> [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00
>>> ffff8801d88f03d0
>>> [1227348.003912] Call Trace:
>>> [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
>>> [sunrpc]
>>> [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40
>>> [sunrpc]
>>> [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90
>>> [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40
>>> [sunrpc]
>>> [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90
>>> [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40
>>> [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc]
>>> [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc]
>>
>> That basically means that the process is hanging in the RPC layer, somewhere
>> in the state machine. ?echo 0 >/proc/sys/sunrpc/rpc_debug? as the ?root?
>> user should give us a dump of which state these RPC calls are in. Can you
>> please try that?
> Yes I will definitely run that the next time it happens, but since it occurs
> sporadically (and I have not yet found a way to reproduce it on demand), it
> could be days before it occurs again. I'll also run "netstat -tn" to check the
> TCP connections the next time this happens.

If you are comfortable applying patches and compiling your own kernels, then you might want to try applying the fix for a certain out-of-socket-buffer race that Neil reported, and that I suspect you may be hitting. The patch has been sent to the ?stable kernel? series, and so should appear soon in Debian?s own kernels, but if this is bothering you now, then go for it?

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=06ea0bfe6e6043cb56a78935a19f6f8ebc636226

_________________________________
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2014-04-04 18:15:50

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Trond,

----- Original Message -----
> From: "Brian Hawley" <[email protected]>
> To: "Ric Wheeler" <[email protected]>, "Brian Hawley" <[email protected]>, "Trond Myklebust"
> <[email protected]>
> Cc: "Andrew Martin" <[email protected]>, "Jim Rees" <[email protected]>, "Brown Neil" <[email protected]>,
> [email protected], [email protected]
> Sent: Thursday, March 6, 2014 1:38:15 PM
> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
>
>
> I agree completely that the write() returning only means it's in the page
> cache.
>
> I agree completely that fsync() result is the only way to know your data is
> safe.
>
> Neither of those is what I, or the original poster (and what other posters in
> the past) on this subject are disputing or concerned about.
>
> The issue is, the write() call (in my case - read() in the original posters
> case) does NOT return.
Is it possible with the "sync" mount option (or via another method) to force
all writes to fsync and fail immediately if they do not succeed? In other
words skip the cache? In some applications I'd rather pass the error back up
to the application right away for it to handle (even if the error is caused
by network turbulence) rather than risk getting into this situation where
writes block forever.

Thanks,

Andrew

2014-04-04 18:16:15

by Andrew Martin

[permalink] [raw]
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels

Bruce,

----- Original Message -----
> From: "Dr Fields James Bruce" <[email protected]>
> > Bruce, it looks like the above should have been fixed in Linux 2.6.35 with
> > commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is
> > that correct?
>
> Yes, that definitely looks it would explain the bug. And the sysrq
> trace shows 2.6.32-57.
>
> Andrew Martin, can you confirm that the problem is no longer
> reproduceable on a kernel with that patch applied?
I have upgraded to 3.0.0-32. Since this problem is intermittent, I'm not sure
when I will be able to reproduce it (if ever), but I'll reply to this thread
if it ever reoccur.

Thanks everyone for the help!

Andrew