2012-06-24 21:34:18

by Andi Kleen

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

On Sat, Jun 23, 2012 at 08:26:04PM +0800, Fengguang Wu wrote:
> Hi,
>
> When doing parallel kernel builds on an NFSROOT system with 32 logical
> CPUs, I see half CPU time spent in kernel:
>
> top - 20:08:28 up 5:01, 8 users, load average: 62.97, 55.82, 54.56
> Tasks: 751 total, 80 running, 671 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.1%us, 53.5%sy, 46.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
> Mem: 65401132k total, 36455604k used, 28945528k free, 0k buffers
> Swap: 0k total, 0k used, 0k free, 31858244k cached
>
> The attached lock_stat shows that rpcauth_lookup_credcache() has the
> outstanding contentions.

Hmm, the cache is already using RCU for the read case, just takes
a lock for writing. Or when checking an entry.

So you either have a lot of lookups on the same entry, or you thrash
the cache.

When you look up the look address, is it the one in the first loop?
If yes then it's the first.

I'm not fully sure why it takes the lock in the read case anyways. Neither
test_bit nor atomic_inc under RCU should need a lock, so I suppose
it can be just removed.

367 spin_lock(&cache->lock);
368 if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
369 spin_unlock(&cache->lock);
370 continue;
371 }
372 cred = get_rpccred(entry);
373 spin_unlock(&cache->lock);

and get_rpccred is just

154 static inline
155 struct rpc_cred * get_rpccred(struct rpc_cred *cred)
156 {
157 atomic_inc(&cred->cr_count);
158 return cred;
159 }

Can you try this patch?

-andi


commit fa1eef2ec22f2fc31e0381b864044fbb753dd572
Author: Andi Kleen <[email protected]>
Date: Sun Jun 24 14:31:06 2012 -0700

sunrpc: remove useless spinlocks in credential lookup path

Fengguang noticed that rpcauth_lookup_credcache has high lock contention
on the nfs server when doing kernel builds on nfsroot.

There is no reason to take the spinlock in the read loop: the RCU
makes sure the object does not go away, and either test_bit
nor atomic_inc in get_rpccred() needs a lock.

So just remove the spinlock in the read lookup path.

Reported-by: Fengguang Wu
Signed-off-by: Andi Kleen <[email protected]>

diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 727e506..13c2b58 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -364,13 +364,10 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
if (!entry->cr_ops->crmatch(acred, entry, flags))
continue;
- spin_lock(&cache->lock);
if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
- spin_unlock(&cache->lock);
continue;
}
cred = get_rpccred(entry);
- spin_unlock(&cache->lock);
break;
}
rcu_read_unlock();


2012-06-27 18:36:43

by Myklebust, Trond

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

T24gV2VkLCAyMDEyLTA2LTI3IGF0IDExOjAzIC0wNzAwLCBBbmRpIEtsZWVuIHdyb3RlOg0KPiA+
ID4gQ2FuIHlvdSB0cnkgdGhpcyBwYXRjaD8NCj4gPiANCj4gPiBUaGFuayB5b3UhIFRoaXMgcGF0
Y2ggYnJpbmdzICVzeXMgZG93biB0byAyNCUsIGEgcHJldHR5IGxhcmdlIGltcHJvdmVtZW50IQ0K
PiANCj4gSSByZWRpZCB0aGUgcGF0Y2ggbm93IGFuZCBmaXhlZCB0aGUgcmFjZSBUcm9uZCBwb2lu
dGVkIG91dC4NCj4gDQo+IEZlbmdndWFuZywgY2FuIHlvdSByZXRlc3QgcGxlYXNlPyBUcm9uZCwg
ZG9lcyBpdCBsb29rIGdvb2Qgbm93Pw0KDQpIaSBBbmRpLA0KDQpUaGVyZSBpcyBzdGlsbCBhIHJh
Y2UgdGhlcmU6IGlmIHlvdXIgdGhyZWFkIGRvZXMgYXRvbWljX2FkZF9yZXR1cm4oKSA9PQ0KMSwg
dGhlbiBhIHNlY29uZCB0aHJlYWQgY2FuIHNjYW4sIGRvIHRoZSBhdG9taWNfYWRkX3JldHVybigp
IGFuZCBnZXQgYQ0KdmFsdWUgb2YgMiwgaW4gd2hpY2ggY2FzZSBpdCB3b24ndCBjaGVjayBmb3Ig
SEFTSEVELi4uDQoNCkNoZWVycw0KICBUcm9uZA0KDQo+IC0tLQ0KPiANCj4gRnJvbTogQW5kaSBL
bGVlbiA8YWtAbGludXguaW50ZWwuY29tPg0KPiBEYXRlOiBTdW4sIDI0IEp1biAyMDEyIDE0OjMx
OjA2IC0wNzAwDQo+IFN1YmplY3Q6IFtQQVRDSF0gc3VucnBjOiBJbXByb3ZlIHNwaW5sb2NrcyBp
biBjcmVkZW50aWFsIGxvb2t1cCBwYXRoIHYyDQo+IA0KPiBGZW5nZ3Vhbmcgbm90aWNlZCB0aGF0
IHJwY2F1dGhfbG9va3VwX2NyZWRjYWNoZSBoYXMgaGlnaCBsb2NrIGNvbnRlbnRpb24NCj4gb24g
dGhlIG5mcyBzZXJ2ZXIgd2hlbiBkb2luZyBrZXJuZWwgYnVpbGRzIG9uIG5mc3Jvb3QuDQo+IA0K
PiBUaGUgb25seSByZWFzb24gdGhlIHNwaW5sb2NrIGlzIG5lZWRlZCBpbiB0aGUgbG9ja3VwIHBh
dGggaXMgdG8NCj4gc3luY2hyb25pemUgd2l0aCB0aGUgZ2FyYmFnZSBjb2xsZWN0b3IuIEZpcnN0
IHRoZSBvYmplY3QgaXRzZWxmDQo+IGlzIHN0YWJsZSBiZWNhdXNlIGl0J3MgUkNVJ2VkLg0KPiAN
Cj4gU28gbm93IHdlIHVzZSBhbiBhdG9taWNfYWRkX3JldHVybiBhbmQgY2hlY2sgZm9yIHRoZSAw
IHJlZmVyZW5jZQ0KPiBjb3VudCBjYXNlLiAgV2hlbiB0aGUgY291bnQgd2FzIDAgYXNzdW1lIHRo
ZSBnYXJiYWdlIGNvbGxlY3Rvcg0KPiBpcyBhY3RpdmUgb24gdGhpcyBlbnRyeSBhbmQgdGFrZSB0
aGUgbG9jayBhbmQgcmVjaGVjay4NCj4gDQo+IE90aGVyd2lzZSB0aGUgcGF0aCBpcyBsb2NrbGVz
cy4NCj4gDQo+IHYyOiBBZGQgR0MgcmFjZSBjaGVjayBiYXNlZCBvbiBUcm9uZCdzIGZlZWRiYWNr
DQo+IENjOiBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0KPiBSZXBvcnRlZC1ieTogRmVuZ2d1
YW5nIFd1DQo+IFNpZ25lZC1vZmYtYnk6IEFuZGkgS2xlZW4gPGFrQGxpbnV4LmludGVsLmNvbT4N
Cj4gLS0tDQo+ICBuZXQvc3VucnBjL2F1dGguYyB8ICAgMTkgKysrKysrKysrKysrKysrLS0tLQ0K
PiAgMSBmaWxlcyBjaGFuZ2VkLCAxNSBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQ0KPiAN
Cj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMvYXV0aC5jIGIvbmV0L3N1bnJwYy9hdXRoLmMNCj4g
aW5kZXggNzI3ZTUwNi4uNjQ1NzY5ZSAxMDA2NDQNCj4gLS0tIGEvbmV0L3N1bnJwYy9hdXRoLmMN
Cj4gKysrIGIvbmV0L3N1bnJwYy9hdXRoLmMNCj4gQEAgLTM2NCwxMyArMzY0LDI0IEBAIHJwY2F1
dGhfbG9va3VwX2NyZWRjYWNoZShzdHJ1Y3QgcnBjX2F1dGggKmF1dGgsIHN0cnVjdCBhdXRoX2Ny
ZWQgKiBhY3JlZCwNCj4gIAlobGlzdF9mb3JfZWFjaF9lbnRyeV9yY3UoZW50cnksIHBvcywgJmNh
Y2hlLT5oYXNodGFibGVbbnJdLCBjcl9oYXNoKSB7DQo+ICAJCWlmICghZW50cnktPmNyX29wcy0+
Y3JtYXRjaChhY3JlZCwgZW50cnksIGZsYWdzKSkNCj4gIAkJCWNvbnRpbnVlOw0KPiAtCQlzcGlu
X2xvY2soJmNhY2hlLT5sb2NrKTsNCj4gIAkJaWYgKHRlc3RfYml0KFJQQ0FVVEhfQ1JFRF9IQVNI
RUQsICZlbnRyeS0+Y3JfZmxhZ3MpID09IDApIHsNCj4gLQkJCXNwaW5fdW5sb2NrKCZjYWNoZS0+
bG9jayk7DQo+ICAJCQljb250aW51ZTsNCj4gIAkJfQ0KPiAtCQljcmVkID0gZ2V0X3JwY2NyZWQo
ZW50cnkpOw0KPiAtCQlzcGluX3VubG9jaygmY2FjaGUtPmxvY2spOw0KPiArCQljcmVkID0gZW50
cnk7DQo+ICsJCWlmIChhdG9taWNfYWRkX3JldHVybigxLCAmY3JlZC0+Y3JfY291bnQpID09IDEp
IHsNCj4gKwkJCS8qDQo+ICsJCQkgKiBXaGVuIHRoZSBjb3VudCB3YXMgMCB3ZSBjb3VsZCByYWNl
IHdpdGggdGhlIEdDLg0KPiArCQkJICogVGFrZSB0aGUgbG9jayBhbmQgcmVjaGVjay4NCj4gKwkJ
CSAqLw0KPiArCQkJc3Bpbl9sb2NrKCZjYWNoZS0+bG9jayk7DQo+ICsJCQlpZiAoIXRlc3RfYml0
KFJQQ0FVVEhfQ1JFRF9IQVNIRUQsICZlbnRyeS0+Y3JfZmxhZ3MpKSB7DQo+ICsJCQkJLyogTG9z
dCB0aGUgcmFjZS4gKi8NCj4gKwkJCQlhdG9taWNfZGVjKCZjcmVkLT5jcl9jb3VudCk7DQo+ICsJ
CQkJc3Bpbl91bmxvY2soJmNhY2hlLT5sb2NrKTsNCj4gKwkJCQljb250aW51ZTsNCj4gKwkJCX0N
Cj4gKwkJCXNwaW5fdW5sb2NrKCZjYWNoZS0+bG9jayk7DQo+ICsJCX0NCj4gIAkJYnJlYWs7DQo+
ICAJfQ0KPiAgCXJjdV9yZWFkX3VubG9jaygpOw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGlu
dXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFw
cC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg==

2012-06-27 18:03:53

by Andi Kleen

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

> > Can you try this patch?
>
> Thank you! This patch brings %sys down to 24%, a pretty large improvement!

I redid the patch now and fixed the race Trond pointed out.

Fengguang, can you retest please? Trond, does it look good now?

---

From: Andi Kleen <[email protected]>
Date: Sun, 24 Jun 2012 14:31:06 -0700
Subject: [PATCH] sunrpc: Improve spinlocks in credential lookup path v2

Fengguang noticed that rpcauth_lookup_credcache has high lock contention
on the nfs server when doing kernel builds on nfsroot.

The only reason the spinlock is needed in the lockup path is to
synchronize with the garbage collector. First the object itself
is stable because it's RCU'ed.

So now we use an atomic_add_return and check for the 0 reference
count case. When the count was 0 assume the garbage collector
is active on this entry and take the lock and recheck.

Otherwise the path is lockless.

v2: Add GC race check based on Trond's feedback
Cc: [email protected]
Reported-by: Fengguang Wu
Signed-off-by: Andi Kleen <[email protected]>
---
net/sunrpc/auth.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 727e506..645769e 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
if (!entry->cr_ops->crmatch(acred, entry, flags))
continue;
- spin_lock(&cache->lock);
if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
- spin_unlock(&cache->lock);
continue;
}
- cred = get_rpccred(entry);
- spin_unlock(&cache->lock);
+ cred = entry;
+ if (atomic_add_return(1, &cred->cr_count) == 1) {
+ /*
+ * When the count was 0 we could race with the GC.
+ * Take the lock and recheck.
+ */
+ spin_lock(&cache->lock);
+ if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) {
+ /* Lost the race. */
+ atomic_dec(&cred->cr_count);
+ spin_unlock(&cache->lock);
+ continue;
+ }
+ spin_unlock(&cache->lock);
+ }
break;
}
rcu_read_unlock();
--
1.7.7


2012-06-25 02:42:18

by Fengguang Wu

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

On Sun, Jun 24, 2012 at 02:34:17PM -0700, Andi Kleen wrote:
> On Sat, Jun 23, 2012 at 08:26:04PM +0800, Fengguang Wu wrote:
> > Hi,
> >
> > When doing parallel kernel builds on an NFSROOT system with 32 logical
> > CPUs, I see half CPU time spent in kernel:
> >
> > top - 20:08:28 up 5:01, 8 users, load average: 62.97, 55.82, 54.56
> > Tasks: 751 total, 80 running, 671 sleeping, 0 stopped, 0 zombie
> > Cpu(s): 0.1%us, 53.5%sy, 46.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
> > Mem: 65401132k total, 36455604k used, 28945528k free, 0k buffers
> > Swap: 0k total, 0k used, 0k free, 31858244k cached
> >
> > The attached lock_stat shows that rpcauth_lookup_credcache() has the
> > outstanding contentions.
>
> Hmm, the cache is already using RCU for the read case, just takes
> a lock for writing. Or when checking an entry.
>
> So you either have a lot of lookups on the same entry, or you thrash
> the cache.
>
> When you look up the look address, is it the one in the first loop?
> If yes then it's the first.
>
> I'm not fully sure why it takes the lock in the read case anyways. Neither
> test_bit nor atomic_inc under RCU should need a lock, so I suppose
> it can be just removed.
>
> 367 spin_lock(&cache->lock);
> 368 if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
> 369 spin_unlock(&cache->lock);
> 370 continue;
> 371 }
> 372 cred = get_rpccred(entry);
> 373 spin_unlock(&cache->lock);
>
> and get_rpccred is just
>
> 154 static inline
> 155 struct rpc_cred * get_rpccred(struct rpc_cred *cred)
> 156 {
> 157 atomic_inc(&cred->cr_count);
> 158 return cred;
> 159 }
>
> Can you try this patch?

Thank you! This patch brings %sys down to 24%, a pretty large improvement!

root@snb /home/wfg# vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
78 0 0 21294520 0 8822056 0 0 0 0 46 56 7 3 90 0
71 0 0 21312104 0 8832776 0 0 0 0 32831 14960 76 24 0 0
76 0 0 21203800 0 8844552 0 0 0 0 32892 15100 75 25 0 0
73 0 0 21019004 0 8856028 0 0 0 0 32837 14770 75 25 0 0
72 0 0 21021084 0 8865996 0 0 0 0 32841 14901 76 24 0 0
76 0 0 20907260 0 8876084 0 0 0 0 32678 14643 76 24 0 0
68 0 0 20936304 0 8886732 0 0 0 0 32724 14858 78 22 0 0
72 0 0 20889164 0 8897520 0 0 0 0 32732 14813 76 24 0 0
81 0 0 20742408 0 8907368 0 0 0 0 32745 14669 77 23 0 0
70 0 0 20662060 0 8917840 0 0 0 0 32748 14751 76 24 0 0
74 0 0 21035616 0 8927836 0 0 0 0 32781 14856 77 23 0 0
^C

It completely removes the function from lock_stat. Now the new
contented functions are

class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

&(&dentry->d_lock)->rlock: 197445146 201224451 0.10 2777.33 461033150.94 1345525784 6033591257 0.08 321.13 1158851748.52
-------------------------
&(&dentry->d_lock)->rlock 111609824 [<ffffffff811609f1>] dput+0x33/0x169
&(&dentry->d_lock)->rlock 18819236 [<ffffffff81156b86>] path_get+0x2c/0x3c
&(&dentry->d_lock)->rlock 17621335 [<ffffffff81156c20>] unlazy_walk+0x8a/0x1ac
&(&dentry->d_lock)->rlock 34148964 [<ffffffff81160233>] dget_parent+0x4b/0xaf
-------------------------
&(&dentry->d_lock)->rlock 14495017 [<ffffffff81156b86>] path_get+0x2c/0x3c
&(&dentry->d_lock)->rlock 89313409 [<ffffffff811609f1>] dput+0x33/0x169
&(&dentry->d_lock)->rlock 19487816 [<ffffffff81156c20>] unlazy_walk+0x8a/0x1ac
&(&dentry->d_lock)->rlock 59151508 [<ffffffff81160233>] dget_parent+0x4b/0xaf

...............................................................................................................................................................................................

&sb->s_type->i_lock_key#1: 24313829 24330907 0.09 185.01 22864745.70 572924538 1422754983 0.08 244.15 852837006.69
-------------------------
&sb->s_type->i_lock_key#1 24263909 [<ffffffff81244ad9>] nfs_do_access+0x2b/0x350
&sb->s_type->i_lock_key#1 1438 [<ffffffff8125069d>] nfs_unblock_sillyrename+0x9b/0xce
&sb->s_type->i_lock_key#1 24986 [<ffffffff81247049>] nfs_file_set_open_context+0x3a/0x62
&sb->s_type->i_lock_key#1 18689 [<ffffffff813f8285>] _atomic_dec_and_lock+0x39/0x50
-------------------------
&sb->s_type->i_lock_key#1 24280596 [<ffffffff81244ad9>] nfs_do_access+0x2b/0x350
&sb->s_type->i_lock_key#1 578 [<ffffffff8125069d>] nfs_unblock_sillyrename+0x9b/0xce
&sb->s_type->i_lock_key#1 8602 [<ffffffff813f8285>] _atomic_dec_and_lock+0x39/0x50
&sb->s_type->i_lock_key#1 15649 [<ffffffff81247186>] nfs_release+0x47/0x93

Thanks,
Fengguang

> commit fa1eef2ec22f2fc31e0381b864044fbb753dd572
> Author: Andi Kleen <[email protected]>
> Date: Sun Jun 24 14:31:06 2012 -0700
>
> sunrpc: remove useless spinlocks in credential lookup path
>
> Fengguang noticed that rpcauth_lookup_credcache has high lock contention
> on the nfs server when doing kernel builds on nfsroot.
>
> There is no reason to take the spinlock in the read loop: the RCU
> makes sure the object does not go away, and either test_bit
> nor atomic_inc in get_rpccred() needs a lock.
>
> So just remove the spinlock in the read lookup path.
>
> Reported-by: Fengguang Wu
> Signed-off-by: Andi Kleen <[email protected]>
>
> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
> index 727e506..13c2b58 100644
> --- a/net/sunrpc/auth.c
> +++ b/net/sunrpc/auth.c
> @@ -364,13 +364,10 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
> hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
> if (!entry->cr_ops->crmatch(acred, entry, flags))
> continue;
> - spin_lock(&cache->lock);
> if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
> - spin_unlock(&cache->lock);
> continue;
> }
> cred = get_rpccred(entry);
> - spin_unlock(&cache->lock);
> break;
> }
> rcu_read_unlock();

2012-06-25 02:45:52

by Andi Kleen

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

> I believe that the spinlock is there in order to ensure that the
> test_bit() and get_rpccred() are atomic w.r.t. garbage collection.

Normally RCU should ensure that already with minor care.
Only drop after a quiescent period and when the reference is still 0.
And drop always when the refcount goes to 0.

-Andi


--
[email protected] -- Speaking for myself only

2012-06-25 01:21:47

by Myklebust, Trond

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

T24gU3VuLCAyMDEyLTA2LTI0IGF0IDE0OjM0IC0wNzAwLCBBbmRpIEtsZWVuIHdyb3RlOg0KPiBP
biBTYXQsIEp1biAyMywgMjAxMiBhdCAwODoyNjowNFBNICswODAwLCBGZW5nZ3VhbmcgV3Ugd3Jv
dGU6DQo+ID4gSGksDQo+ID4gDQo+ID4gV2hlbiBkb2luZyBwYXJhbGxlbCBrZXJuZWwgYnVpbGRz
IG9uIGFuIE5GU1JPT1Qgc3lzdGVtIHdpdGggMzIgbG9naWNhbA0KPiA+IENQVXMsIEkgc2VlIGhh
bGYgQ1BVIHRpbWUgc3BlbnQgaW4ga2VybmVsOg0KPiA+IA0KPiA+IHRvcCAtIDIwOjA4OjI4IHVw
ICA1OjAxLCAgOCB1c2VycywgIGxvYWQgYXZlcmFnZTogNjIuOTcsIDU1LjgyLCA1NC41Ng0KPiA+
IFRhc2tzOiA3NTEgdG90YWwsICA4MCBydW5uaW5nLCA2NzEgc2xlZXBpbmcsICAgMCBzdG9wcGVk
LCAgIDAgem9tYmllDQo+ID4gQ3B1KHMpOiAgMC4xJXVzLCA1My41JXN5LCA0Ni40JW5pLCAgMC4w
JWlkLCAgMC4wJXdhLCAgMC4wJWhpLCAgMC4xJXNpLCAgMC4wJXN0DQo+ID4gTWVtOiAgNjU0MDEx
MzJrIHRvdGFsLCAzNjQ1NTYwNGsgdXNlZCwgMjg5NDU1MjhrIGZyZWUsICAgICAgICAwayBidWZm
ZXJzDQo+ID4gU3dhcDogICAgICAgIDBrIHRvdGFsLCAgICAgICAgMGsgdXNlZCwgICAgICAgIDBr
IGZyZWUsIDMxODU4MjQ0ayBjYWNoZWQNCj4gPiANCj4gPiBUaGUgYXR0YWNoZWQgbG9ja19zdGF0
IHNob3dzIHRoYXQgcnBjYXV0aF9sb29rdXBfY3JlZGNhY2hlKCkgaGFzIHRoZQ0KPiA+IG91dHN0
YW5kaW5nIGNvbnRlbnRpb25zLg0KPiANCj4gSG1tLCB0aGUgY2FjaGUgaXMgYWxyZWFkeSB1c2lu
ZyBSQ1UgZm9yIHRoZSByZWFkIGNhc2UsIGp1c3QgdGFrZXMNCj4gYSBsb2NrIGZvciB3cml0aW5n
LiBPciB3aGVuIGNoZWNraW5nIGFuIGVudHJ5Lg0KPiANCj4gU28geW91IGVpdGhlciBoYXZlIGEg
bG90IG9mIGxvb2t1cHMgb24gdGhlIHNhbWUgZW50cnksIG9yIHlvdSB0aHJhc2gNCj4gdGhlIGNh
Y2hlLg0KPiANCj4gV2hlbiB5b3UgbG9vayB1cCB0aGUgbG9vayBhZGRyZXNzLCBpcyBpdCB0aGUg
b25lIGluIHRoZSBmaXJzdCBsb29wPw0KPiBJZiB5ZXMgdGhlbiBpdCdzIHRoZSBmaXJzdC4NCj4g
DQo+IEknbSBub3QgZnVsbHkgc3VyZSB3aHkgaXQgdGFrZXMgdGhlIGxvY2sgaW4gdGhlIHJlYWQg
Y2FzZSBhbnl3YXlzLiBOZWl0aGVyDQo+IHRlc3RfYml0IG5vciBhdG9taWNfaW5jIHVuZGVyIFJD
VSBzaG91bGQgbmVlZCBhIGxvY2ssIHNvIEkgc3VwcG9zZQ0KPiBpdCBjYW4gYmUganVzdCByZW1v
dmVkLg0KPiANCj4gMzY3ICAgICAgICAgICAgICAgICBzcGluX2xvY2soJmNhY2hlLT5sb2NrKTsN
Cj4gMzY4ICAgICAgICAgICAgICAgICBpZiAodGVzdF9iaXQoUlBDQVVUSF9DUkVEX0hBU0hFRCwg
JmVudHJ5LT5jcl9mbGFncykgPT0gMCkgew0KPiAzNjkgICAgICAgICAgICAgICAgICAgICAgICAg
c3Bpbl91bmxvY2soJmNhY2hlLT5sb2NrKTsNCj4gMzcwICAgICAgICAgICAgICAgICAgICAgICAg
IGNvbnRpbnVlOw0KPiAzNzEgICAgICAgICAgICAgICAgIH0NCj4gMzcyICAgICAgICAgICAgICAg
ICBjcmVkID0gZ2V0X3JwY2NyZWQoZW50cnkpOw0KPiAzNzMgICAgICAgICAgICAgICAgIHNwaW5f
dW5sb2NrKCZjYWNoZS0+bG9jayk7DQoNCkhpIEFuZGksDQoNCkkgYmVsaWV2ZSB0aGF0IHRoZSBz
cGlubG9jayBpcyB0aGVyZSBpbiBvcmRlciB0byBlbnN1cmUgdGhhdCB0aGUNCnRlc3RfYml0KCkg
YW5kIGdldF9ycGNjcmVkKCkgYXJlIGF0b21pYyB3LnIudC4gZ2FyYmFnZSBjb2xsZWN0aW9uLg0K
DQpXaGF0IHdlIGNvdWxkIHBlcmhhcHMgZG8gaXMgY3JlYXRlIGEgbG9jay1sZXNzIGZhc3QgcGF0
aC4gRm9yIGluc3RhbmNlDQp1c2UgYXRvbWljX2luY191bmxlc3NfemVybyB0byBncmFiIGEgcmVm
ZXJlbmNlIGluIGNhc2VzIHdoZXJlIHdlIGtub3cNCnRoYXQgdGhlIGdhcmJhZ2UgY29sbGVjdG9y
IGNhbid0IGFjdCBvbiB0aGlzIGNyZWQuIFdlJ2Qgc3RpbGwgbmVlZCB0bw0KdGVzdCBmb3IgUlBD
QVVUSF9DUkVEX0hBU0hFRCwgYnV0IHNpbmNlIHdlJ2QgYmUgaG9sZGluZyBhIHJlZmVyZW5jZSB3
ZQ0KY291bGQgZG8gdGhhdCBvdXRzaWRlIHRoZSBsb2NrLi4uDQoNCkNoZWVycw0KICBUcm9uZA0K
LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRB
cHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo=

2012-07-05 15:13:01

by Malahal Naineni

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

Fengguang Wu [[email protected]] wrote:
> > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
> > index 727e506..645769e 100644
> > --- a/net/sunrpc/auth.c
> > +++ b/net/sunrpc/auth.c
> > @@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
> > hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
> > if (!entry->cr_ops->crmatch(acred, entry, flags))
> > continue;
> > - spin_lock(&cache->lock);
> > if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
> > - spin_unlock(&cache->lock);
> > continue;
> > }
> > - cred = get_rpccred(entry);
> > - spin_unlock(&cache->lock);
> > + cred = entry;

I know very little about this code, so take this with grain of salt!
The cred symbol is set above but there is no guaranty that you will use
it. If you don't win, you should set it to NULL.

> > + if (atomic_add_return(1, &cred->cr_count) == 1) {
> > + /*
> > + * When the count was 0 we could race with the GC.
> > + * Take the lock and recheck.
> > + */
> > + spin_lock(&cache->lock);
> > + if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) {
> > + /* Lost the race. */
> > + atomic_dec(&cred->cr_count);
> > + spin_unlock(&cache->lock);
> > + continue;

If you execute the above "continue" statement, you will have stale "cred"
where the original code may end up with NULL cred. I don't know if that
can happen really though.

> > + }
> > + spin_unlock(&cache->lock);
> > + }
> > break;

Setting cred to entry just before the above break and using entry in the
above code should be fine, I believe.

> > }
> > rcu_read_unlock();
> > --
> > 1.7.7
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2012-07-05 16:29:04

by Andi Kleen

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

> I know very little about this code, so take this with grain of salt!
> The cred symbol is set above but there is no guaranty that you will use
> it. If you don't win, you should set it to NULL.

Yes you're probably right. Thanks.

-Andi


2012-07-05 13:52:40

by Fengguang Wu

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

On Wed, Jun 27, 2012 at 11:03:53AM -0700, Andi Kleen wrote:
> > > Can you try this patch?
> >
> > Thank you! This patch brings %sys down to 24%, a pretty large improvement!
>
> I redid the patch now and fixed the race Trond pointed out.
>
> Fengguang, can you retest please? Trond, does it look good now?

Andi, this patch performs equally well! And it runs stable for over a week.

wfg@snb ~% vmstat 3
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
78 0 0 14084304 0 15489456 0 0 0 0 2 5 43 17 39 0
76 0 0 14077684 0 15501356 0 0 0 0 32553 14948 76 24 0 0
100 0 0 14079660 0 15513388 0 0 0 0 32562 14961 75 25 0 0
120 0 0 12934440 0 15523132 0 0 0 0 32780 15917 74 26 0 0
118 0 0 12845200 0 15531784 0 0 0 0 32495 14932 78 22 0 0
108 0 0 13004212 0 15543092 0 0 0 0 32541 15020 78 22 0 0
120 0 0 13114864 0 15551964 0 0 0 0 32502 14900 78 22 0 0
100 0 0 13305164 0 15562040 0 0 0 0 32497 14980 77 23 0 0
95 0 0 13094524 0 15573836 0 0 0 0 32551 14992 78 22 0 0


Now the most contented locks are

class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acqui
sitions holdtime-min holdtime-max holdtime-total
------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------

&(&zone->lru_lock)->rlock: 766782640 767439846 0.09 359.51 1315341516.73 6714323642 1003
4569711 0.10 13565.21 22972522899.23
-------------------------
&(&zone->lru_lock)->rlock 403547412 [<ffffffff8110dcf7>] release_pages+0xd9/0x197
&(&zone->lru_lock)->rlock 363659746 [<ffffffff8110de31>] pagevec_lru_move_fn+0x7c/0xd1
&(&zone->lru_lock)->rlock 232683 [<ffffffff8110d965>] __page_cache_release.part.8+0x47/0xcc
&(&zone->lru_lock)->rlock 4 [<ffffffff811126c8>] isolate_lru_page+0x66/0x118
-------------------------
&(&zone->lru_lock)->rlock 402927785 [<ffffffff8110dcf7>] release_pages+0xd9/0x197
&(&zone->lru_lock)->rlock 364420999 [<ffffffff8110de31>] pagevec_lru_move_fn+0x7c/0xd1
&(&zone->lru_lock)->rlock 91058 [<ffffffff8110d965>] __page_cache_release.part.8+0x47/0xcc
&(&zone->lru_lock)->rlock 3 [<ffffffff8110e269>] add_page_to_unevictable_list+0x43/0x90

..........................................................................................................................................
.....................................................

cpufreq_driver_lock: 443785151 443792591 0.14 60.87 7083609116.92 511710115 51
3985181 0.11 31.25 303483059.29
-------------------
cpufreq_driver_lock 443792591 [<ffffffff817d5eef>] cpufreq_cpu_get+0x22/0x9f
-------------------
cpufreq_driver_lock 443792591 [<ffffffff817d5eef>] cpufreq_cpu_get+0x22/0x9f

..........................................................................................................................................
.....................................................

rcu_node_level_1: 333497521 345451048 0.10 38.72 742394589.16 676477679 175
3637221 0.00 24.74 698337725.87
----------------
rcu_node_level_1 320813377 [<ffffffff810d6bc2>] rcu_report_qs_rdp.isra.26+0x25/0x78
rcu_node_level_1 23968287 [<ffffffff810d6ac1>] force_qs_rnp+0x3b/0x117
rcu_node_level_1 35937 [<ffffffff810d69d8>] rcu_report_qs_rnp+0x1ed/0x29b
rcu_node_level_1 633447 [<ffffffff810d6668>] rcu_start_gp+0x158/0x2db
----------------
rcu_node_level_1 278686102 [<ffffffff810d6bc2>] rcu_report_qs_rdp.isra.26+0x25/0x78
rcu_node_level_1 31540615 [<ffffffff810d6ac1>] force_qs_rnp+0x3b/0x117
rcu_node_level_1 634856 [<ffffffff810d69d8>] rcu_report_qs_rnp+0x1ed/0x29b
rcu_node_level_1 34589475 [<ffffffff810d6668>] rcu_start_gp+0x158/0x2db

Thanks,
Fengguang

> ---
>
> From: Andi Kleen <[email protected]>
> Date: Sun, 24 Jun 2012 14:31:06 -0700
> Subject: [PATCH] sunrpc: Improve spinlocks in credential lookup path v2
>
> Fengguang noticed that rpcauth_lookup_credcache has high lock contention
> on the nfs server when doing kernel builds on nfsroot.
>
> The only reason the spinlock is needed in the lockup path is to
> synchronize with the garbage collector. First the object itself
> is stable because it's RCU'ed.
>
> So now we use an atomic_add_return and check for the 0 reference
> count case. When the count was 0 assume the garbage collector
> is active on this entry and take the lock and recheck.
>
> Otherwise the path is lockless.
>
> v2: Add GC race check based on Trond's feedback
> Cc: [email protected]
> Reported-by: Fengguang Wu
> Signed-off-by: Andi Kleen <[email protected]>
> ---
> net/sunrpc/auth.c | 19 +++++++++++++++----
> 1 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
> index 727e506..645769e 100644
> --- a/net/sunrpc/auth.c
> +++ b/net/sunrpc/auth.c
> @@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
> hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) {
> if (!entry->cr_ops->crmatch(acred, entry, flags))
> continue;
> - spin_lock(&cache->lock);
> if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) {
> - spin_unlock(&cache->lock);
> continue;
> }
> - cred = get_rpccred(entry);
> - spin_unlock(&cache->lock);
> + cred = entry;
> + if (atomic_add_return(1, &cred->cr_count) == 1) {
> + /*
> + * When the count was 0 we could race with the GC.
> + * Take the lock and recheck.
> + */
> + spin_lock(&cache->lock);
> + if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) {
> + /* Lost the race. */
> + atomic_dec(&cred->cr_count);
> + spin_unlock(&cache->lock);
> + continue;
> + }
> + spin_unlock(&cache->lock);
> + }
> break;
> }
> rcu_read_unlock();
> --
> 1.7.7

2012-07-05 16:27:21

by Andi Kleen

[permalink] [raw]
Subject: Re: rpcauth_lookup_credcache() lock contentions

On Thu, Jul 05, 2012 at 09:11:05PM +0800, Fengguang Wu wrote:
> On Wed, Jun 27, 2012 at 11:03:53AM -0700, Andi Kleen wrote:
> > > > Can you try this patch?
> > >
> > > Thank you! This patch brings %sys down to 24%, a pretty large improvement!
> >
> > I redid the patch now and fixed the race Trond pointed out.
> >
> > Fengguang, can you retest please? Trond, does it look good now?
>
> Andi, this patch performs equally well! And it runs stable for over a week.

Thanks for testing. But it still has the race Trond pointed out.
Probably need to rely on write ordering with the uncached flag.

-Andi