2012-06-16 02:42:46

by Andre Tomt

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 01. juni 2012 05:41, Fengguang Wu wrote:
> Hi David,
>
> It's the second time the machine oops in __key_instantiate_and_link() on
> 3.4 kernel. This bug only happens after several days of run. Do you
> have any advices or debug patches? For now I can try older (or newer)
> kernels and see if it's any better.

FWIW; I am (still) seeing this exact same crash several times an hour on
3.4.3-rc1 on Ubuntu 12.04 client. Only that the same second it happens,
all my three displays corrupts badly, becoming completely unreadable.
Switching to console and back usually gets my desktop back on two of
three displays, and sometimes it will need a full xserver restart.
Obviously some memory corruption going on.

One of the crashes also triggered a NX error:
[20292.196332] kernel tried to execute NX-protected page - exploit
attempt? (uid: 0)

And after a while, things start locking up.

It didn't really start happening until a few days ago though, I've been
running 3.4 for since some -rc through all the stable releases. Perhaps
server suddenly got a working idmapper or something? Its a debian
unstable updated a couple times a month.

Booting latest git master now, to see if any of the recent NFS fixes
just pulled by Linus fixes anything (-rc2 had other showstopper nfs issues).

>
> [53056.100019] BUG: unable to handle kernel paging request at 0000632e6472616f
> [53056.108072] IP: [<0000632e6472616f>] 0x632e6472616e
> [53056.113702] PGD 0
> [53056.116119] Oops: 0010 [#1] SMP
> [53056.119982] CPU 0
> [53056.122111] Modules linked in:
> [53056.125969]
> [53056.127713] Pid: 3502, comm: rpc.idmapd Not tainted 3.4.0 #130 Intel Corporation S2600CP/S2600CP
> [53056.137880] RIP: 0010:[<0000632e6472616f>] [<0000632e6472616f>] 0x632e6472616e
> [53056.146228] RSP: 0018:ffff880421d2fd30 EFLAGS: 00010246
> [53056.152249] RAX: ffff8803bef69100 RBX: ffff8803bef690d0 RCX: ffffffff81f0f440
> [53056.160309] RDX: 0000000000000005 RSI: ffff880421d2fe8d RDI: ffff8803bef690d0
> [53056.168371] RBP: ffff880421d2fd88 R08: ffff880421d2fcd8 R09: 0000000000000000
> [53056.176435] R10: ffffffff81f0f468 R11: ffffffff813bbdbf R12: ffff880421d2fe8d
> [53056.184504] R13: ffff8805c9957b00 R14: ffff8804287b9c00 R15: 0000000000000000
> [53056.192564] FS: 00007f87fbce5700(0000) GS:ffff88042fa00000(0000) knlGS:0000000000000000
> [53056.202451] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [53056.209343] CR2: 0000632e6472616f CR3: 000000042951a000 CR4: 00000000000407f0
> [53056.217784] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [53056.226226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [53056.234675] Process rpc.idmapd (pid: 3502, threadinfo ffff880421d2e000, task ffff880427865e20)
> [53056.245144] Stack:
> [53056.247841] ffffffff813bbde2 ffff880416c54820 0000000000000005 ffff880421d2fdb0
> [53056.257281] fffffff0287b9c28 ffff8803bef690b0 ffff8803bef690d0 ffff8804287b9c00
> [53056.266725] 0000000000000005 ffff880421d2fe8d 00007f87fbaea800 ffff880421d2fdd8
> [53056.276172] Call Trace:
> [53056.279363] [<ffffffff813bbde2>] ? __key_instantiate_and_link+0x5e/0xe4
> [53056.287316] [<ffffffff813bbec5>] key_instantiate_and_link+0x5d/0x85
> [53056.294888] [<ffffffff81262c29>] idmap_pipe_downcall+0x14a/0x18f
> [53056.302176] [<ffffffff81958071>] rpc_pipe_write+0x5d/0x77
> [53056.308782] [<ffffffff8114bfbd>] vfs_write+0xb2/0x142
> [53056.315000] [<ffffffff8114c247>] sys_write+0x4a/0x71
> [53056.321108] [<ffffffff819bc329>] system_call_fastpath+0x16/0x1b
> [53056.328278] Code: Bad RIP value.
> [53056.332599] RIP [<0000632e6472616f>] 0x632e6472616e
> [53056.338687] RSP<ffff880421d2fd30>
> [53056.343035] CR2: 0000632e6472616f
> [53056.347640] ---[ end trace ee5d4100fdd3e1d2 ]---


2012-06-18 09:04:59

by Andre Tomt

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 16. juni 2012 21:59, Myklebust, Trond wrote:
> It looks to me as if the legacy upcall code is assuming that there can
> be no more than 1 upcall at a time: there is only a single
> idmap->idmap_key_cons, which gets assigned in nfs_idmap_legacy_upcall
> and then read in idmap_pipe_downcall.
>
> Bryan, can you look into this? I suspect that we need a mutex or
> something like that (for the legacy upcall case only) to ensure that
> nobody overwrites the idmap->idmap_key_cons while an upcall is in
> progress.
>
> Andre, if you want idmapper scalability, then you should rather use the
> new idmapper upcall. You need a recent version of the nfs-utils package,
> the keyutils package, and they you should add an 'id_resolver' line
> to /etc/request-keys.conf as per the nfsidmap manpage.

Indeed, using keyutils did avoid the crashes here, 40 hours and counting.

Are there any downsides of having keyutils w/ id_resolver on by default
in a distribution? Would it break older kernels or nfs-utils (just not
getting used is fine, obviously)?

2012-06-26 07:24:21

by Andre Tomt

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 20. juni 2012 20:27, Bryan Schumaker wrote:
> Hi Andre,
>
> Have you had a chance to test my patch (below) yet?

I reverted from using keyutils yesterday, verified the problem had
returned, then patched the same kernel with just this patch - so far its
been running fine for 15 hours. Without the patch it would fall over
within 1-2 hours.

So thumbs up from me

Sorry about the delay, once I got it working with keyutils I suddenly
got very, very lazy ;-)

2012-06-26 12:43:00

by Anna Schumaker

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 06/26/2012 03:24 AM, Andre Tomt wrote:
> On 20. juni 2012 20:27, Bryan Schumaker wrote:
>> Hi Andre,
>>
>> Have you had a chance to test my patch (below) yet?
>
> I reverted from using keyutils yesterday, verified the problem had returned, then patched the same kernel with just this patch - so far its been running fine for 15 hours. Without the patch it would fall over within 1-2 hours.
>
> So thumbs up from me
>
> Sorry about the delay, once I got it working with keyutils I suddenly got very, very lazy ;-)

Cool, thanks for checking this! You can go back to whichever method you like better now :).

- Bryan


2012-06-18 12:44:11

by Anna Schumaker

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 06/16/2012 03:59 PM, Myklebust, Trond wrote:
> On Sat, 2012-06-16 at 20:43 +0200, Andre Tomt wrote:
>> On 16. juni 2012 04:32, Andre Tomt wrote:
>>> FWIW; I am (still) seeing this exact same crash several times an hour on
>>> 3.4.3-rc1 on Ubuntu 12.04 client. Only that the same second it happens,
>>> all my three displays corrupts badly, becoming completely unreadable.
>>> Switching to console and back usually gets my desktop back on two of
>>> three displays, and sometimes it will need a full xserver restart.
>>> Obviously some memory corruption going on.
>>>
>>> One of the crashes also triggered a NX error:
>>> [20292.196332] kernel tried to execute NX-protected page - exploit
>>> attempt? (uid: 0)
>>>
>>> And after a while, things start locking up.
>>>
>>> It didn't really start happening until a few days ago though, I've been
>>> running 3.4 for since some -rc through all the stable releases. Perhaps
>>> server suddenly got a working idmapper or something? Its a debian
>>> unstable updated a couple times a month.
>>>
>>> Booting latest git master now, to see if any of the recent NFS fixes
>>> just pulled by Linus fixes anything (-rc2 had other showstopper nfs
>>> issues).
>>
>> Just had it happen with 3.5-git as of a couple hours ago (last commit
>> a2c2df8672f55195f101d9251117aa59e358d296):
>> Jun 16 19:01:49 slurv kernel: [50823.271618] general protection fault:
>> 0000 [#1] SMP
>> Jun 16 19:01:49 slurv kernel: [50823.271640] CPU 10
>> Jun 16 19:01:49 slurv kernel: [50823.271690]
>> Jun 16 19:01:49 slurv kernel: [50823.271698] Pid: 1678, comm: rpc.idmapd
>> Not tainted 3.5.0-1-desktop #1 System manufacturer System Product
>> Name/P6T DELUXE V2
>> Jun 16 19:01:49 slurv kernel: [50823.271738] RIP:
>> 0010:[<ffffffff81129872>] [<ffffffff81129872>]
>> __key_instantiate_and_link+0x52/0xcb
>> Jun 16 19:01:49 slurv kernel: [50823.271767] RSP: 0018:ffff88061941bd38
>> EFLAGS: 00010246
>> Jun 16 19:01:49 slurv kernel: [50823.271785] RAX: 6337346330366233 RBX:
>> ffff880606fb97f0 RCX: 0000000000000000
>> Jun 16 19:01:49 slurv kernel: [50823.271807] RDX: 0000000000000006 RSI:
>> ffff88061941be85 RDI: ffff880606fb97f0
>> Jun 16 19:01:49 slurv kernel: [50823.271829] RBP: ffff88061941bd88 R08:
>> ffff8802bc9ed380 R09: ffff88061941bdb0
>> Jun 16 19:01:49 slurv kernel: [50823.271850] R10: 0000000000000000 R11:
>> 0000000000000000 R12: ffff88061aad63c0
>> Jun 16 19:01:49 slurv kernel: [50823.271872] R13: ffff8802bc9ed380 R14:
>> 0000000000000000 R15: ffff88061941bdb0
>> Jun 16 19:01:49 slurv kernel: [50823.271894] FS: 00007fd449a7d700(0000)
>> GS:ffff88063fd40000(0000) knlGS:0000000000000000
>> Jun 16 19:01:49 slurv kernel: [50823.271919] CS: 0010 DS: 0000 ES: 0000
>> CR0: 0000000080050033
>> Jun 16 19:01:49 slurv kernel: [50823.271937] CR2: 00007f8844002028 CR3:
>> 000000061be95000 CR4: 00000000000007e0
>> Jun 16 19:01:49 slurv kernel: [50823.271959] DR0: 0000000000000000 DR1:
>> 0000000000000000 DR2: 0000000000000000
>> Jun 16 19:01:49 slurv kernel: [50823.271980] DR3: 0000000000000000 DR6:
>> 00000000ffff0ff0 DR7: 0000000000000400
>> Jun 16 19:01:49 slurv kernel: [50823.272003] Process rpc.idmapd (pid:
>> 1678, threadinfo ffff88061941a000, task ffff88061b870000)
>> Jun 16 19:01:49 slurv kernel: [50823.272028] Stack:
>> Jun 16 19:01:49 slurv kernel: [50823.272036] ffff88061941bd78
>> 0000000000000006 ffff88061941be85 fffffff01aad63e8
>> Jun 16 19:01:49 slurv kernel: [50823.272060] ffff880606fb9840
>> ffff880606fb97f0 ffff88061aad63c0 0000000000000006
>> Jun 16 19:01:49 slurv kernel: [50823.272083] ffff88061941be85
>> 00007fffe09c4500 ffff88061941bdd8 ffffffff81129943
>> Jun 16 19:01:49 slurv kernel: [50823.272106] Call Trace:
>> Jun 16 19:01:49 slurv kernel: [50823.272116] [<ffffffff81129943>]
>> key_instantiate_and_link+0x58/0x80
>> Jun 16 19:01:49 slurv kernel: [50823.272145] [<ffffffffa058a906>]
>> idmap_pipe_downcall+0x154/0x1ad [nfs]
>> Jun 16 19:01:49 slurv kernel: [50823.272173] [<ffffffffa04aeae2>]
>> rpc_pipe_write+0x56/0x6f [sunrpc]
>> Jun 16 19:01:49 slurv kernel: [50823.272195] [<ffffffff810c86ce>]
>> vfs_write+0xad/0x13d
>> Jun 16 19:01:49 slurv kernel: [50823.272212] [<ffffffff810c8949>]
>> sys_write+0x45/0x6c
>> Jun 16 19:01:49 slurv kernel: [50823.272229] [<ffffffff8130a462>]
>> system_call_fastpath+0x16/0x1b
>> Jun 16 19:01:49 slurv kernel: [50823.272258] Code: 48 89 55 b8 48 89 75
>> c0 e8 16 e4 1d 00 48 8b 43 78 c7 45 cc f0 ff ff ff 48 8b 55 b8 48 8b 75
>> c0 a8 01 75 4f 48 8b 43 20 48 89 df <ff> 50 18 85 c0 89 45 cc 75 3e 48
>> 8b 43 48 f0 ff 40 44 f0 80 4b
>> Jun 16 19:01:49 slurv kernel: [50823.272317] RIP [<ffffffff81129872>]
>> __key_instantiate_and_link+0x52/0xcb
>> Jun 16 19:01:49 slurv kernel: [50823.272339] RSP <ffff88061941bd38>
>> Jun 16 19:01:49 slurv kernel: [50823.279199] ---[ end trace
>> 25122b5e9d0b0c76 ]---
>>
>> It did take a while this time.
>
> It looks to me as if the legacy upcall code is assuming that there can
> be no more than 1 upcall at a time: there is only a single
> idmap->idmap_key_cons, which gets assigned in nfs_idmap_legacy_upcall
> and then read in idmap_pipe_downcall.
>
> Bryan, can you look into this? I suspect that we need a mutex or
> something like that (for the legacy upcall case only) to ensure that
> nobody overwrites the idmap->idmap_key_cons while an upcall is in
> progress.

Sure, I'll look at this now. Adding in a mutex sounds like a simple enough fix.

- Bryan
>
> Andre, if you want idmapper scalability, then you should rather use the
> new idmapper upcall. You need a recent version of the nfs-utils package,
> the keyutils package, and they you should add an 'id_resolver' line
> to /etc/request-keys.conf as per the nfsidmap manpage.
>
> Cheers
> Trond
>



2012-06-18 16:11:58

by Anna Schumaker

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 06/18/2012 09:28 AM, Myklebust, Trond wrote:
> On Mon, 2012-06-18 at 11:04 +0200, Andre Tomt wrote:
>> On 16. juni 2012 21:59, Myklebust, Trond wrote:
>>> It looks to me as if the legacy upcall code is assuming that there can
>>> be no more than 1 upcall at a time: there is only a single
>>> idmap->idmap_key_cons, which gets assigned in nfs_idmap_legacy_upcall
>>> and then read in idmap_pipe_downcall.
>>>
>>> Bryan, can you look into this? I suspect that we need a mutex or
>>> something like that (for the legacy upcall case only) to ensure that
>>> nobody overwrites the idmap->idmap_key_cons while an upcall is in
>>> progress.
>>>
>>> Andre, if you want idmapper scalability, then you should rather use the
>>> new idmapper upcall. You need a recent version of the nfs-utils package,
>>> the keyutils package, and they you should add an 'id_resolver' line
>>> to /etc/request-keys.conf as per the nfsidmap manpage.
>>
>> Indeed, using keyutils did avoid the crashes here, 40 hours and counting.
>>
>> Are there any downsides of having keyutils w/ id_resolver on by default
>> in a distribution? Would it break older kernels or nfs-utils (just not
>> getting used is fine, obviously)?
>
> Older kernels aren't able to use the keyutils mechanism, so they will
> still require you to run the idmapd daemon, but there should be no
> problems with just enabling it in /etc/request-key.conf.
> Fedora 17 is supposed to install the id_resolver by default.
>

Hi Andre,

Can you please check if this patch fixes the old idmapper?

- Bryan

>From 3bef58765c7308965d06d9e9d7707c7ca55648ee Mon Sep 17 00:00:00 2001
From: Bryan Schumaker <[email protected]>
Date: Mon, 18 Jun 2012 12:01:25 -0400
Subject: [PATCH] NFS: Force the legacy idmapper to be single threaded

It was initially coded under the assumption that there would only be one
request at a time, so use a lock to enforce this requirement..

Signed-off-by: Bryan Schumaker <[email protected]>
---
fs/nfs/idmap.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/idmap.c b/fs/nfs/idmap.c
index b5b86a0..864c51e 100644
--- a/fs/nfs/idmap.c
+++ b/fs/nfs/idmap.c
@@ -57,6 +57,11 @@ unsigned int nfs_idmap_cache_timeout = 600;
static const struct cred *id_resolver_cache;
static struct key_type key_type_id_resolver_legacy;

+struct idmap {
+ struct rpc_pipe *idmap_pipe;
+ struct key_construction *idmap_key_cons;
+ struct mutex idmap_mutex;
+};

/**
* nfs_fattr_init_names - initialise the nfs_fattr owner_name/group_name fields
@@ -310,9 +315,11 @@ static ssize_t nfs_idmap_get_key(const char *name, size_t namelen,
name, namelen, type, data,
data_size, NULL);
if (ret < 0) {
+ mutex_lock(&idmap->idmap_mutex);
ret = nfs_idmap_request_key(&key_type_id_resolver_legacy,
name, namelen, type, data,
data_size, idmap);
+ mutex_unlock(&idmap->idmap_mutex);
}
return ret;
}
@@ -354,11 +361,6 @@ static int nfs_idmap_lookup_id(const char *name, size_t namelen, const char *typ
/* idmap classic begins here */
module_param(nfs_idmap_cache_timeout, int, 0644);

-struct idmap {
- struct rpc_pipe *idmap_pipe;
- struct key_construction *idmap_key_cons;
-};
-
enum {
Opt_find_uid, Opt_find_gid, Opt_find_user, Opt_find_group, Opt_find_err
};
@@ -469,6 +471,7 @@ nfs_idmap_new(struct nfs_client *clp)
return error;
}
idmap->idmap_pipe = pipe;
+ mutex_init(&idmap->idmap_mutex);

clp->cl_idmap = idmap;
return 0;
--
1.7.10.4



2012-06-16 19:59:08

by Myklebust, Trond

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

T24gU2F0LCAyMDEyLTA2LTE2IGF0IDIwOjQzICswMjAwLCBBbmRyZSBUb210IHdyb3RlOg0KPiBP
biAxNi4ganVuaSAyMDEyIDA0OjMyLCBBbmRyZSBUb210IHdyb3RlOg0KPiA+IEZXSVc7IEkgYW0g
KHN0aWxsKSBzZWVpbmcgdGhpcyBleGFjdCBzYW1lIGNyYXNoIHNldmVyYWwgdGltZXMgYW4gaG91
ciBvbg0KPiA+IDMuNC4zLXJjMSBvbiBVYnVudHUgMTIuMDQgY2xpZW50LiBPbmx5IHRoYXQgdGhl
IHNhbWUgc2Vjb25kIGl0IGhhcHBlbnMsDQo+ID4gYWxsIG15IHRocmVlIGRpc3BsYXlzIGNvcnJ1
cHRzIGJhZGx5LCBiZWNvbWluZyBjb21wbGV0ZWx5IHVucmVhZGFibGUuDQo+ID4gU3dpdGNoaW5n
IHRvIGNvbnNvbGUgYW5kIGJhY2sgdXN1YWxseSBnZXRzIG15IGRlc2t0b3AgYmFjayBvbiB0d28g
b2YNCj4gPiB0aHJlZSBkaXNwbGF5cywgYW5kIHNvbWV0aW1lcyBpdCB3aWxsIG5lZWQgYSBmdWxs
IHhzZXJ2ZXIgcmVzdGFydC4NCj4gPiBPYnZpb3VzbHkgc29tZSBtZW1vcnkgY29ycnVwdGlvbiBn
b2luZyBvbi4NCj4gPg0KPiA+IE9uZSBvZiB0aGUgY3Jhc2hlcyBhbHNvIHRyaWdnZXJlZCBhIE5Y
IGVycm9yOg0KPiA+IFsyMDI5Mi4xOTYzMzJdIGtlcm5lbCB0cmllZCB0byBleGVjdXRlIE5YLXBy
b3RlY3RlZCBwYWdlIC0gZXhwbG9pdA0KPiA+IGF0dGVtcHQ/ICh1aWQ6IDApDQo+ID4NCj4gPiBB
bmQgYWZ0ZXIgYSB3aGlsZSwgdGhpbmdzIHN0YXJ0IGxvY2tpbmcgdXAuDQo+ID4NCj4gPiBJdCBk
aWRuJ3QgcmVhbGx5IHN0YXJ0IGhhcHBlbmluZyB1bnRpbCBhIGZldyBkYXlzIGFnbyB0aG91Z2gs
IEkndmUgYmVlbg0KPiA+IHJ1bm5pbmcgMy40IGZvciBzaW5jZSBzb21lIC1yYyB0aHJvdWdoIGFs
bCB0aGUgc3RhYmxlIHJlbGVhc2VzLiBQZXJoYXBzDQo+ID4gc2VydmVyIHN1ZGRlbmx5IGdvdCBh
IHdvcmtpbmcgaWRtYXBwZXIgb3Igc29tZXRoaW5nPyBJdHMgYSBkZWJpYW4NCj4gPiB1bnN0YWJs
ZSB1cGRhdGVkIGEgY291cGxlIHRpbWVzIGEgbW9udGguDQo+ID4NCj4gPiBCb290aW5nIGxhdGVz
dCBnaXQgbWFzdGVyIG5vdywgdG8gc2VlIGlmIGFueSBvZiB0aGUgcmVjZW50IE5GUyBmaXhlcw0K
PiA+IGp1c3QgcHVsbGVkIGJ5IExpbnVzIGZpeGVzIGFueXRoaW5nICgtcmMyIGhhZCBvdGhlciBz
aG93c3RvcHBlciBuZnMNCj4gPiBpc3N1ZXMpLg0KPiANCj4gSnVzdCBoYWQgaXQgaGFwcGVuIHdp
dGggMy41LWdpdCBhcyBvZiBhIGNvdXBsZSBob3VycyBhZ28gKGxhc3QgY29tbWl0IA0KPiBhMmMy
ZGY4NjcyZjU1MTk1ZjEwMWQ5MjUxMTE3YWE1OWUzNThkMjk2KToNCj4gSnVuIDE2IDE5OjAxOjQ5
IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3MTYxOF0gZ2VuZXJhbCBwcm90ZWN0aW9uIGZhdWx0OiAN
Cj4gMDAwMCBbIzFdIFNNUA0KPiBKdW4gMTYgMTk6MDE6NDkgc2x1cnYga2VybmVsOiBbNTA4MjMu
MjcxNjQwXSBDUFUgMTANCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3
MTY5MF0NCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3MTY5OF0gUGlk
OiAxNjc4LCBjb21tOiBycGMuaWRtYXBkIA0KPiBOb3QgdGFpbnRlZCAzLjUuMC0xLWRlc2t0b3Ag
IzEgU3lzdGVtIG1hbnVmYWN0dXJlciBTeXN0ZW0gUHJvZHVjdCANCj4gTmFtZS9QNlQgREVMVVhF
IFYyDQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzE3MzhdIFJJUDog
DQo+IDAwMTA6WzxmZmZmZmZmZjgxMTI5ODcyPl0gIFs8ZmZmZmZmZmY4MTEyOTg3Mj5dIA0KPiBf
X2tleV9pbnN0YW50aWF0ZV9hbmRfbGluaysweDUyLzB4Y2INCj4gSnVuIDE2IDE5OjAxOjQ5IHNs
dXJ2IGtlcm5lbDogWzUwODIzLjI3MTc2N10gUlNQOiAwMDE4OmZmZmY4ODA2MTk0MWJkMzggDQo+
ICAgRUZMQUdTOiAwMDAxMDI0Ng0KPiBKdW4gMTYgMTk6MDE6NDkgc2x1cnYga2VybmVsOiBbNTA4
MjMuMjcxNzg1XSBSQVg6IDYzMzczNDYzMzAzNjYyMzMgUkJYOiANCj4gZmZmZjg4MDYwNmZiOTdm
MCBSQ1g6IDAwMDAwMDAwMDAwMDAwMDANCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDog
WzUwODIzLjI3MTgwN10gUkRYOiAwMDAwMDAwMDAwMDAwMDA2IFJTSTogDQo+IGZmZmY4ODA2MTk0
MWJlODUgUkRJOiBmZmZmODgwNjA2ZmI5N2YwDQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJu
ZWw6IFs1MDgyMy4yNzE4MjldIFJCUDogZmZmZjg4MDYxOTQxYmQ4OCBSMDg6IA0KPiBmZmZmODgw
MmJjOWVkMzgwIFIwOTogZmZmZjg4MDYxOTQxYmRiMA0KPiBKdW4gMTYgMTk6MDE6NDkgc2x1cnYg
a2VybmVsOiBbNTA4MjMuMjcxODUwXSBSMTA6IDAwMDAwMDAwMDAwMDAwMDAgUjExOiANCj4gMDAw
MDAwMDAwMDAwMDAwMCBSMTI6IGZmZmY4ODA2MWFhZDYzYzANCj4gSnVuIDE2IDE5OjAxOjQ5IHNs
dXJ2IGtlcm5lbDogWzUwODIzLjI3MTg3Ml0gUjEzOiBmZmZmODgwMmJjOWVkMzgwIFIxNDogDQo+
IDAwMDAwMDAwMDAwMDAwMDAgUjE1OiBmZmZmODgwNjE5NDFiZGIwDQo+IEp1biAxNiAxOTowMTo0
OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzE4OTRdIEZTOiAgMDAwMDdmZDQ0OWE3ZDcwMCgwMDAw
KSANCj4gR1M6ZmZmZjg4MDYzZmQ0MDAwMCgwMDAwKSBrbmxHUzowMDAwMDAwMDAwMDAwMDAwDQo+
IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzE5MTldIENTOiAgMDAxMCBE
UzogMDAwMCBFUzogMDAwMCANCj4gQ1IwOiAwMDAwMDAwMDgwMDUwMDMzDQo+IEp1biAxNiAxOTow
MTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzE5MzddIENSMjogMDAwMDdmODg0NDAwMjAyOCBD
UjM6IA0KPiAwMDAwMDAwNjFiZTk1MDAwIENSNDogMDAwMDAwMDAwMDAwMDdlMA0KPiBKdW4gMTYg
MTk6MDE6NDkgc2x1cnYga2VybmVsOiBbNTA4MjMuMjcxOTU5XSBEUjA6IDAwMDAwMDAwMDAwMDAw
MDAgRFIxOiANCj4gMDAwMDAwMDAwMDAwMDAwMCBEUjI6IDAwMDAwMDAwMDAwMDAwMDANCj4gSnVu
IDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3MTk4MF0gRFIzOiAwMDAwMDAwMDAw
MDAwMDAwIERSNjogDQo+IDAwMDAwMDAwZmZmZjBmZjAgRFI3OiAwMDAwMDAwMDAwMDAwNDAwDQo+
IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzIwMDNdIFByb2Nlc3MgcnBj
LmlkbWFwZCAocGlkOiANCj4gMTY3OCwgdGhyZWFkaW5mbyBmZmZmODgwNjE5NDFhMDAwLCB0YXNr
IGZmZmY4ODA2MWI4NzAwMDApDQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgy
My4yNzIwMjhdIFN0YWNrOg0KPiBKdW4gMTYgMTk6MDE6NDkgc2x1cnYga2VybmVsOiBbNTA4MjMu
MjcyMDM2XSAgZmZmZjg4MDYxOTQxYmQ3OCANCj4gMDAwMDAwMDAwMDAwMDAwNiBmZmZmODgwNjE5
NDFiZTg1IGZmZmZmZmYwMWFhZDYzZTgNCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDog
WzUwODIzLjI3MjA2MF0gIGZmZmY4ODA2MDZmYjk4NDAgDQo+IGZmZmY4ODA2MDZmYjk3ZjAgZmZm
Zjg4MDYxYWFkNjNjMCAwMDAwMDAwMDAwMDAwMDA2DQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBr
ZXJuZWw6IFs1MDgyMy4yNzIwODNdICBmZmZmODgwNjE5NDFiZTg1IA0KPiAwMDAwN2ZmZmUwOWM0
NTAwIGZmZmY4ODA2MTk0MWJkZDggZmZmZmZmZmY4MTEyOTk0Mw0KPiBKdW4gMTYgMTk6MDE6NDkg
c2x1cnYga2VybmVsOiBbNTA4MjMuMjcyMTA2XSBDYWxsIFRyYWNlOg0KPiBKdW4gMTYgMTk6MDE6
NDkgc2x1cnYga2VybmVsOiBbNTA4MjMuMjcyMTE2XSAgWzxmZmZmZmZmZjgxMTI5OTQzPl0gDQo+
IGtleV9pbnN0YW50aWF0ZV9hbmRfbGluaysweDU4LzB4ODANCj4gSnVuIDE2IDE5OjAxOjQ5IHNs
dXJ2IGtlcm5lbDogWzUwODIzLjI3MjE0NV0gIFs8ZmZmZmZmZmZhMDU4YTkwNj5dIA0KPiBpZG1h
cF9waXBlX2Rvd25jYWxsKzB4MTU0LzB4MWFkIFtuZnNdDQo+IEp1biAxNiAxOTowMTo0OSBzbHVy
diBrZXJuZWw6IFs1MDgyMy4yNzIxNzNdICBbPGZmZmZmZmZmYTA0YWVhZTI+XSANCj4gcnBjX3Bp
cGVfd3JpdGUrMHg1Ni8weDZmIFtzdW5ycGNdDQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJu
ZWw6IFs1MDgyMy4yNzIxOTVdICBbPGZmZmZmZmZmODEwYzg2Y2U+XSANCj4gdmZzX3dyaXRlKzB4
YWQvMHgxM2QNCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3MjIxMl0g
IFs8ZmZmZmZmZmY4MTBjODk0OT5dIA0KPiBzeXNfd3JpdGUrMHg0NS8weDZjDQo+IEp1biAxNiAx
OTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzIyMjldICBbPGZmZmZmZmZmODEzMGE0NjI+
XSANCj4gc3lzdGVtX2NhbGxfZmFzdHBhdGgrMHgxNi8weDFiDQo+IEp1biAxNiAxOTowMTo0OSBz
bHVydiBrZXJuZWw6IFs1MDgyMy4yNzIyNThdIENvZGU6IDQ4IDg5IDU1IGI4IDQ4IDg5IDc1IA0K
PiBjMCBlOCAxNiBlNCAxZCAwMCA0OCA4YiA0MyA3OCBjNyA0NSBjYyBmMCBmZiBmZiBmZiA0OCA4
YiA1NSBiOCA0OCA4YiA3NSANCj4gYzAgYTggMDEgNzUgNGYgNDggOGIgNDMgMjAgNDggODkgZGYg
PGZmPiA1MCAxOCA4NSBjMCA4OSA0NSBjYyA3NSAzZSA0OCANCj4gOGIgNDMgNDggZjAgZmYgNDAg
NDQgZjAgODAgNGINCj4gSnVuIDE2IDE5OjAxOjQ5IHNsdXJ2IGtlcm5lbDogWzUwODIzLjI3MjMx
N10gUklQICBbPGZmZmZmZmZmODExMjk4NzI+XSANCj4gX19rZXlfaW5zdGFudGlhdGVfYW5kX2xp
bmsrMHg1Mi8weGNiDQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJuZWw6IFs1MDgyMy4yNzIz
MzldICBSU1AgPGZmZmY4ODA2MTk0MWJkMzg+DQo+IEp1biAxNiAxOTowMTo0OSBzbHVydiBrZXJu
ZWw6IFs1MDgyMy4yNzkxOTldIC0tLVsgZW5kIHRyYWNlIA0KPiAyNTEyMmI1ZTlkMGIwYzc2IF0t
LS0NCj4gDQo+IEl0IGRpZCB0YWtlIGEgd2hpbGUgdGhpcyB0aW1lLg0KDQpJdCBsb29rcyB0byBt
ZSBhcyBpZiB0aGUgbGVnYWN5IHVwY2FsbCBjb2RlIGlzIGFzc3VtaW5nIHRoYXQgdGhlcmUgY2Fu
DQpiZSBubyBtb3JlIHRoYW4gMSB1cGNhbGwgYXQgYSB0aW1lOiB0aGVyZSBpcyBvbmx5IGEgc2lu
Z2xlDQppZG1hcC0+aWRtYXBfa2V5X2NvbnMsIHdoaWNoIGdldHMgYXNzaWduZWQgaW4gbmZzX2lk
bWFwX2xlZ2FjeV91cGNhbGwNCmFuZCB0aGVuIHJlYWQgaW4gaWRtYXBfcGlwZV9kb3duY2FsbC4N
Cg0KQnJ5YW4sIGNhbiB5b3UgbG9vayBpbnRvIHRoaXM/IEkgc3VzcGVjdCB0aGF0IHdlIG5lZWQg
YSBtdXRleCBvcg0Kc29tZXRoaW5nIGxpa2UgdGhhdCAoZm9yIHRoZSBsZWdhY3kgdXBjYWxsIGNh
c2Ugb25seSkgdG8gZW5zdXJlIHRoYXQNCm5vYm9keSBvdmVyd3JpdGVzIHRoZSBpZG1hcC0+aWRt
YXBfa2V5X2NvbnMgd2hpbGUgYW4gdXBjYWxsIGlzIGluDQpwcm9ncmVzcy4NCg0KQW5kcmUsIGlm
IHlvdSB3YW50IGlkbWFwcGVyIHNjYWxhYmlsaXR5LCB0aGVuIHlvdSBzaG91bGQgcmF0aGVyIHVz
ZSB0aGUNCm5ldyBpZG1hcHBlciB1cGNhbGwuIFlvdSBuZWVkIGEgcmVjZW50IHZlcnNpb24gb2Yg
dGhlIG5mcy11dGlscyBwYWNrYWdlLA0KdGhlIGtleXV0aWxzIHBhY2thZ2UsIGFuZCB0aGV5IHlv
dSBzaG91bGQgYWRkIGFuICdpZF9yZXNvbHZlcicgbGluZQ0KdG8gL2V0Yy9yZXF1ZXN0LWtleXMu
Y29uZiBhcyBwZXIgdGhlIG5mc2lkbWFwIG1hbnBhZ2UuDQoNCkNoZWVycw0KICBUcm9uZA0KDQot
LSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFw
cA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

2012-06-18 13:29:05

by Myklebust, Trond

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

T24gTW9uLCAyMDEyLTA2LTE4IGF0IDExOjA0ICswMjAwLCBBbmRyZSBUb210IHdyb3RlOg0KPiBP
biAxNi4ganVuaSAyMDEyIDIxOjU5LCBNeWtsZWJ1c3QsIFRyb25kIHdyb3RlOg0KPiA+IEl0IGxv
b2tzIHRvIG1lIGFzIGlmIHRoZSBsZWdhY3kgdXBjYWxsIGNvZGUgaXMgYXNzdW1pbmcgdGhhdCB0
aGVyZSBjYW4NCj4gPiBiZSBubyBtb3JlIHRoYW4gMSB1cGNhbGwgYXQgYSB0aW1lOiB0aGVyZSBp
cyBvbmx5IGEgc2luZ2xlDQo+ID4gaWRtYXAtPmlkbWFwX2tleV9jb25zLCB3aGljaCBnZXRzIGFz
c2lnbmVkIGluIG5mc19pZG1hcF9sZWdhY3lfdXBjYWxsDQo+ID4gYW5kIHRoZW4gcmVhZCBpbiBp
ZG1hcF9waXBlX2Rvd25jYWxsLg0KPiA+DQo+ID4gQnJ5YW4sIGNhbiB5b3UgbG9vayBpbnRvIHRo
aXM/IEkgc3VzcGVjdCB0aGF0IHdlIG5lZWQgYSBtdXRleCBvcg0KPiA+IHNvbWV0aGluZyBsaWtl
IHRoYXQgKGZvciB0aGUgbGVnYWN5IHVwY2FsbCBjYXNlIG9ubHkpIHRvIGVuc3VyZSB0aGF0DQo+
ID4gbm9ib2R5IG92ZXJ3cml0ZXMgdGhlIGlkbWFwLT5pZG1hcF9rZXlfY29ucyB3aGlsZSBhbiB1
cGNhbGwgaXMgaW4NCj4gPiBwcm9ncmVzcy4NCj4gPg0KPiA+IEFuZHJlLCBpZiB5b3Ugd2FudCBp
ZG1hcHBlciBzY2FsYWJpbGl0eSwgdGhlbiB5b3Ugc2hvdWxkIHJhdGhlciB1c2UgdGhlDQo+ID4g
bmV3IGlkbWFwcGVyIHVwY2FsbC4gWW91IG5lZWQgYSByZWNlbnQgdmVyc2lvbiBvZiB0aGUgbmZz
LXV0aWxzIHBhY2thZ2UsDQo+ID4gdGhlIGtleXV0aWxzIHBhY2thZ2UsIGFuZCB0aGV5IHlvdSBz
aG91bGQgYWRkIGFuICdpZF9yZXNvbHZlcicgbGluZQ0KPiA+IHRvIC9ldGMvcmVxdWVzdC1rZXlz
LmNvbmYgYXMgcGVyIHRoZSBuZnNpZG1hcCBtYW5wYWdlLg0KPiANCj4gSW5kZWVkLCB1c2luZyBr
ZXl1dGlscyBkaWQgYXZvaWQgdGhlIGNyYXNoZXMgaGVyZSwgNDAgaG91cnMgYW5kIGNvdW50aW5n
Lg0KPiANCj4gQXJlIHRoZXJlIGFueSBkb3duc2lkZXMgb2YgaGF2aW5nIGtleXV0aWxzIHcvIGlk
X3Jlc29sdmVyIG9uIGJ5IGRlZmF1bHQgDQo+IGluIGEgZGlzdHJpYnV0aW9uPyBXb3VsZCBpdCBi
cmVhayBvbGRlciBrZXJuZWxzIG9yIG5mcy11dGlscyAoanVzdCBub3QgDQo+IGdldHRpbmcgdXNl
ZCBpcyBmaW5lLCBvYnZpb3VzbHkpPw0KDQpPbGRlciBrZXJuZWxzIGFyZW4ndCBhYmxlIHRvIHVz
ZSB0aGUga2V5dXRpbHMgbWVjaGFuaXNtLCBzbyB0aGV5IHdpbGwNCnN0aWxsIHJlcXVpcmUgeW91
IHRvIHJ1biB0aGUgaWRtYXBkIGRhZW1vbiwgYnV0IHRoZXJlIHNob3VsZCBiZSBubw0KcHJvYmxl
bXMgd2l0aCBqdXN0IGVuYWJsaW5nIGl0IGluIC9ldGMvcmVxdWVzdC1rZXkuY29uZi4NCkZlZG9y
YSAxNyBpcyBzdXBwb3NlZCB0byBpbnN0YWxsIHRoZSBpZF9yZXNvbHZlciBieSBkZWZhdWx0Lg0K
DQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5l
dEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

2012-06-16 18:43:34

by Andre Tomt

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

On 16. juni 2012 04:32, Andre Tomt wrote:
> FWIW; I am (still) seeing this exact same crash several times an hour on
> 3.4.3-rc1 on Ubuntu 12.04 client. Only that the same second it happens,
> all my three displays corrupts badly, becoming completely unreadable.
> Switching to console and back usually gets my desktop back on two of
> three displays, and sometimes it will need a full xserver restart.
> Obviously some memory corruption going on.
>
> One of the crashes also triggered a NX error:
> [20292.196332] kernel tried to execute NX-protected page - exploit
> attempt? (uid: 0)
>
> And after a while, things start locking up.
>
> It didn't really start happening until a few days ago though, I've been
> running 3.4 for since some -rc through all the stable releases. Perhaps
> server suddenly got a working idmapper or something? Its a debian
> unstable updated a couple times a month.
>
> Booting latest git master now, to see if any of the recent NFS fixes
> just pulled by Linus fixes anything (-rc2 had other showstopper nfs
> issues).

Just had it happen with 3.5-git as of a couple hours ago (last commit
a2c2df8672f55195f101d9251117aa59e358d296):
Jun 16 19:01:49 slurv kernel: [50823.271618] general protection fault:
0000 [#1] SMP
Jun 16 19:01:49 slurv kernel: [50823.271640] CPU 10
Jun 16 19:01:49 slurv kernel: [50823.271690]
Jun 16 19:01:49 slurv kernel: [50823.271698] Pid: 1678, comm: rpc.idmapd
Not tainted 3.5.0-1-desktop #1 System manufacturer System Product
Name/P6T DELUXE V2
Jun 16 19:01:49 slurv kernel: [50823.271738] RIP:
0010:[<ffffffff81129872>] [<ffffffff81129872>]
__key_instantiate_and_link+0x52/0xcb
Jun 16 19:01:49 slurv kernel: [50823.271767] RSP: 0018:ffff88061941bd38
EFLAGS: 00010246
Jun 16 19:01:49 slurv kernel: [50823.271785] RAX: 6337346330366233 RBX:
ffff880606fb97f0 RCX: 0000000000000000
Jun 16 19:01:49 slurv kernel: [50823.271807] RDX: 0000000000000006 RSI:
ffff88061941be85 RDI: ffff880606fb97f0
Jun 16 19:01:49 slurv kernel: [50823.271829] RBP: ffff88061941bd88 R08:
ffff8802bc9ed380 R09: ffff88061941bdb0
Jun 16 19:01:49 slurv kernel: [50823.271850] R10: 0000000000000000 R11:
0000000000000000 R12: ffff88061aad63c0
Jun 16 19:01:49 slurv kernel: [50823.271872] R13: ffff8802bc9ed380 R14:
0000000000000000 R15: ffff88061941bdb0
Jun 16 19:01:49 slurv kernel: [50823.271894] FS: 00007fd449a7d700(0000)
GS:ffff88063fd40000(0000) knlGS:0000000000000000
Jun 16 19:01:49 slurv kernel: [50823.271919] CS: 0010 DS: 0000 ES: 0000
CR0: 0000000080050033
Jun 16 19:01:49 slurv kernel: [50823.271937] CR2: 00007f8844002028 CR3:
000000061be95000 CR4: 00000000000007e0
Jun 16 19:01:49 slurv kernel: [50823.271959] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Jun 16 19:01:49 slurv kernel: [50823.271980] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Jun 16 19:01:49 slurv kernel: [50823.272003] Process rpc.idmapd (pid:
1678, threadinfo ffff88061941a000, task ffff88061b870000)
Jun 16 19:01:49 slurv kernel: [50823.272028] Stack:
Jun 16 19:01:49 slurv kernel: [50823.272036] ffff88061941bd78
0000000000000006 ffff88061941be85 fffffff01aad63e8
Jun 16 19:01:49 slurv kernel: [50823.272060] ffff880606fb9840
ffff880606fb97f0 ffff88061aad63c0 0000000000000006
Jun 16 19:01:49 slurv kernel: [50823.272083] ffff88061941be85
00007fffe09c4500 ffff88061941bdd8 ffffffff81129943
Jun 16 19:01:49 slurv kernel: [50823.272106] Call Trace:
Jun 16 19:01:49 slurv kernel: [50823.272116] [<ffffffff81129943>]
key_instantiate_and_link+0x58/0x80
Jun 16 19:01:49 slurv kernel: [50823.272145] [<ffffffffa058a906>]
idmap_pipe_downcall+0x154/0x1ad [nfs]
Jun 16 19:01:49 slurv kernel: [50823.272173] [<ffffffffa04aeae2>]
rpc_pipe_write+0x56/0x6f [sunrpc]
Jun 16 19:01:49 slurv kernel: [50823.272195] [<ffffffff810c86ce>]
vfs_write+0xad/0x13d
Jun 16 19:01:49 slurv kernel: [50823.272212] [<ffffffff810c8949>]
sys_write+0x45/0x6c
Jun 16 19:01:49 slurv kernel: [50823.272229] [<ffffffff8130a462>]
system_call_fastpath+0x16/0x1b
Jun 16 19:01:49 slurv kernel: [50823.272258] Code: 48 89 55 b8 48 89 75
c0 e8 16 e4 1d 00 48 8b 43 78 c7 45 cc f0 ff ff ff 48 8b 55 b8 48 8b 75
c0 a8 01 75 4f 48 8b 43 20 48 89 df <ff> 50 18 85 c0 89 45 cc 75 3e 48
8b 43 48 f0 ff 40 44 f0 80 4b
Jun 16 19:01:49 slurv kernel: [50823.272317] RIP [<ffffffff81129872>]
__key_instantiate_and_link+0x52/0xcb
Jun 16 19:01:49 slurv kernel: [50823.272339] RSP <ffff88061941bd38>
Jun 16 19:01:49 slurv kernel: [50823.279199] ---[ end trace
25122b5e9d0b0c76 ]---

It did take a while this time.

2012-06-20 18:27:31

by Anna Schumaker

[permalink] [raw]
Subject: Re: BUG in __key_instantiate_and_link(): unable to handle kernel paging request at 0000632e6472616f

Hi Andre,

Have you had a chance to test my patch (below) yet?

- Bryan

On 06/18/2012 12:11 PM, Bryan Schumaker wrote:
> On 06/18/2012 09:28 AM, Myklebust, Trond wrote:
>> On Mon, 2012-06-18 at 11:04 +0200, Andre Tomt wrote:
>>> On 16. juni 2012 21:59, Myklebust, Trond wrote:
>>>> It looks to me as if the legacy upcall code is assuming that there can
>>>> be no more than 1 upcall at a time: there is only a single
>>>> idmap->idmap_key_cons, which gets assigned in nfs_idmap_legacy_upcall
>>>> and then read in idmap_pipe_downcall.
>>>>
>>>> Bryan, can you look into this? I suspect that we need a mutex or
>>>> something like that (for the legacy upcall case only) to ensure that
>>>> nobody overwrites the idmap->idmap_key_cons while an upcall is in
>>>> progress.
>>>>
>>>> Andre, if you want idmapper scalability, then you should rather use the
>>>> new idmapper upcall. You need a recent version of the nfs-utils package,
>>>> the keyutils package, and they you should add an 'id_resolver' line
>>>> to /etc/request-keys.conf as per the nfsidmap manpage.
>>>
>>> Indeed, using keyutils did avoid the crashes here, 40 hours and counting.
>>>
>>> Are there any downsides of having keyutils w/ id_resolver on by default
>>> in a distribution? Would it break older kernels or nfs-utils (just not
>>> getting used is fine, obviously)?
>>
>> Older kernels aren't able to use the keyutils mechanism, so they will
>> still require you to run the idmapd daemon, but there should be no
>> problems with just enabling it in /etc/request-key.conf.
>> Fedora 17 is supposed to install the id_resolver by default.
>>
>
> Hi Andre,
>
> Can you please check if this patch fixes the old idmapper?
>
> - Bryan
>
>>From 3bef58765c7308965d06d9e9d7707c7ca55648ee Mon Sep 17 00:00:00 2001
> From: Bryan Schumaker <[email protected]>
> Date: Mon, 18 Jun 2012 12:01:25 -0400
> Subject: [PATCH] NFS: Force the legacy idmapper to be single threaded
>
> It was initially coded under the assumption that there would only be one
> request at a time, so use a lock to enforce this requirement..
>
> Signed-off-by: Bryan Schumaker <[email protected]>
> ---
> fs/nfs/idmap.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfs/idmap.c b/fs/nfs/idmap.c
> index b5b86a0..864c51e 100644
> --- a/fs/nfs/idmap.c
> +++ b/fs/nfs/idmap.c
> @@ -57,6 +57,11 @@ unsigned int nfs_idmap_cache_timeout = 600;
> static const struct cred *id_resolver_cache;
> static struct key_type key_type_id_resolver_legacy;
>
> +struct idmap {
> + struct rpc_pipe *idmap_pipe;
> + struct key_construction *idmap_key_cons;
> + struct mutex idmap_mutex;
> +};
>
> /**
> * nfs_fattr_init_names - initialise the nfs_fattr owner_name/group_name fields
> @@ -310,9 +315,11 @@ static ssize_t nfs_idmap_get_key(const char *name, size_t namelen,
> name, namelen, type, data,
> data_size, NULL);
> if (ret < 0) {
> + mutex_lock(&idmap->idmap_mutex);
> ret = nfs_idmap_request_key(&key_type_id_resolver_legacy,
> name, namelen, type, data,
> data_size, idmap);
> + mutex_unlock(&idmap->idmap_mutex);
> }
> return ret;
> }
> @@ -354,11 +361,6 @@ static int nfs_idmap_lookup_id(const char *name, size_t namelen, const char *typ
> /* idmap classic begins here */
> module_param(nfs_idmap_cache_timeout, int, 0644);
>
> -struct idmap {
> - struct rpc_pipe *idmap_pipe;
> - struct key_construction *idmap_key_cons;
> -};
> -
> enum {
> Opt_find_uid, Opt_find_gid, Opt_find_user, Opt_find_group, Opt_find_err
> };
> @@ -469,6 +471,7 @@ nfs_idmap_new(struct nfs_client *clp)
> return error;
> }
> idmap->idmap_pipe = pipe;
> + mutex_init(&idmap->idmap_mutex);
>
> clp->cl_idmap = idmap;
> return 0;
>