2013-06-25 08:56:35

by Daniel J Blueman

[permalink] [raw]
Subject: 13GB dcache+inode cache hash tables

As memory capacity increases, we see the dentry and inode cache hash
tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.

Perhaps a better approach adds a linear component to an exponent to give
tuned scaling, given that spatial locality is an advantage in hash table
and careful use of resources.

The same approach would fit to other hash tables (mount-cache, TCP
established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different
coefficients, so perhaps we could generalise.

If so what are reasonable reference points and assumptions?

Thanks,
Daniel

--- [1]

1GB:
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)

8GB:
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)

1TB:
Dentry cache hash table entries: 134217728 (order: 18, 1073741824 bytes)
Inode-cache hash table entries: 67108864 (order: 17, 536870912 bytes)

4.5TB
Dentry cache hash table entries: 1073741824 (order: 21, 8589934592 bytes)
Inode-cache hash table entries: 536870912 (order: 20, 4294967296 bytes)
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia


2013-06-25 09:48:19

by Eric Dumazet

[permalink] [raw]
Subject: Re: 13GB dcache+inode cache hash tables

On Tue, 2013-06-25 at 16:56 +0800, Daniel J Blueman wrote:
> As memory capacity increases, we see the dentry and inode cache hash
> tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.
>
> Perhaps a better approach adds a linear component to an exponent to give
> tuned scaling, given that spatial locality is an advantage in hash table
> and careful use of resources.
>
> The same approach would fit to other hash tables (mount-cache, TCP
> established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different
> coefficients, so perhaps we could generalise.
>

TCP hash table is limited to 512K slots, unless overridden.
TCP bind limited to 64K slots.
UDP limited to 64K slots.

> If so what are reasonable reference points and assumptions?
>

I do not know what you have in mind, please show us a patch ;)

I would love if all these hash tables could use hugepages.

vmalloc() is nice for NUMA spreading, but being able to use hugepages
for very large hashes could lower TLB pressure...

# grep alloc_large_system_hash /proc/vmallocinfo
0xffffc90000002000-0xffffc90004003000 67112960 alloc_large_system_hash+0x153/0x21c pages=16384 vmalloc vpages N0=8192 N1=8192
0xffffc90004003000-0xffffc90004024000 135168 alloc_large_system_hash+0x153/0x21c pages=32 vmalloc N0=16 N1=16
0xffffc90004024000-0xffffc90006025000 33558528 alloc_large_system_hash+0x153/0x21c pages=8192 vmalloc vpages N0=4096 N1=4096
0xffffc90006025000-0xffffc90006036000 69632 alloc_large_system_hash+0x153/0x21c pages=16 vmalloc N0=8 N1=8
0xffffc90006052000-0xffffc90006057000 20480 alloc_large_system_hash+0x153/0x21c pages=4 vmalloc N0=2 N1=2
0xffffc90016081000-0xffffc90016882000 8392704 alloc_large_system_hash+0x153/0x21c pages=2048 vmalloc vpages N0=1024 N1=1024
0xffffc90016882000-0xffffc90016983000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128
0xffffc90016983000-0xffffc90016a84000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128
0xffffc90016a84000-0xffffc90016b85000 1052672 alloc_large_system_hash+0x153/0x21c pages=256 vmalloc N0=128 N1=128

# dmesg | grep hash
[ 0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.003976] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes)
[ 0.016692] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes)
[ 0.022074] Mount-cache hash table entries: 256
[ 1.089249] TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
[ 1.090651] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[ 1.090946] UDP hash table entries: 32768 (order: 8, 1048576 bytes)
[ 1.091187] UDP-Lite hash table entries: 32768 (order: 8, 1048576 bytes)
[ 1.119761] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)

2013-06-27 09:08:40

by Daniel J Blueman

[permalink] [raw]
Subject: Re: 13GB dcache+inode cache hash tables

On 25/06/2013 17:48, Eric Dumazet wrote:
> On Tue, 2013-06-25 at 16:56 +0800, Daniel J Blueman wrote:
>> As memory capacity increases, we see the dentry and inode cache hash
>> tables grow to wild sizes [1], eg 13GB is consumed on a 4.5TB system.
>>
>> Perhaps a better approach adds a linear component to an exponent to give
>> tuned scaling, given that spatial locality is an advantage in hash table
>> and careful use of resources.
>>
>> The same approach would fit to other hash tables (mount-cache, TCP
>> established, TCP bind, UDP, UDP-Lite, Dquot-cache) with different
>> coefficients, so perhaps we could generalise.
>>
>
> TCP hash table is limited to 512K slots, unless overridden.
> TCP bind limited to 64K slots.
> UDP limited to 64K slots.
>
>> If so what are reasonable reference points and assumptions?
>
> I do not know what you have in mind, please show us a patch ;)
[...]

Alright, I'll see what I can get together in the next week or so when I
can fit it in.

Dan
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia