2017-07-30 02:37:18

by Matthew Wilcox (Oracle)

[permalink] [raw]
Subject: Re: [ext2] Mislabeled quadratic probing?

On Sat, Jul 29, 2017 at 10:24:29AM -0400, Sean Anderson wrote:
> Hi,
>
> I was reading through the ext2 inode allocation code, and found the
> following snippet in fs/ext2/ialloc.c:find_group_other
>
> /*
> * Use a quadratic hash to find a group with a free inode and some
> * free blocks.
> */
> for (i = 1; i < ngroups; i <<= 1) {
> group += i;
> if (group >= ngroups)
> group -= ngroups;
> desc = ext2_get_group_desc (sb, group, NULL);
> if (desc && le16_to_cpu(desc->bg_free_inodes_count) &&
> le16_to_cpu(desc->bg_free_blocks_count))
> goto found;
> }
>
> As I understand it, quadratic probing starting at a hash H would try
> positions H+1, H+4, H+9, H+16, H+25, etc. Here, however, the algorithm
> appears to try positions H+1, H+3, H+7, H+15, H+31, etc., which appears
> to be some form of exponential probing. I was unable to find the patch
> which introduced this code, but it appears that it was introduced in
> v2.4.14.3, and before that linear probing was used. Clearly, this code
> works, and I can't really find any compelling arguments to switch to
> quadratic probing proper. I suspect it was done this way to avoid a
> multiply or an extra subtract on every loop. Can anyone shed some light
> on the choice (and apparent mislabel) of this algorithm?

It can't have been to avoid an arithmetic operation. The quadratic
hash would simply be s/<<= 1/+= 2/ which is going to be equal cost on
basically every CPU.

The biggest danger I see here is that we're only going to test 32
groups before falling back to linear probing (we'll shift the single
'1' bit out of 'i' in 32 steps). That might be a performance problem,
but it should hit quite rarely.

The danger in changing it is that we'll end up with new files created in
a directory choosing a different block group from files created in that
directory using an old kernel. And that could be a worse performance
impact.

I think we'd need to see some benchmarks ... Ted, any suggestions for
something which might show a difference between these two approaches
to hashing?


2017-07-30 22:23:05

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [ext2] Mislabeled quadratic probing?

On Sat, Jul 29, 2017 at 07:37:18PM -0700, Matthew Wilcox wrote:
>
> The biggest danger I see here is that we're only going to test 32
> groups before falling back to linear probing (we'll shift the single
> '1' bit out of 'i' in 32 steps). That might be a performance problem,
> but it should hit quite rarely.
>
> The danger in changing it is that we'll end up with new files created in
> a directory choosing a different block group from files created in that
> directory using an old kernel. And that could be a worse performance
> impact.

The original assumption behind this scheme was that by the time we
fill a block group --- say block group #42, it was highly likely that
the linear probing for block allocations would have filled block
groups #43, #44, #45, etc. So by the time block group was filled,
when we are allocating a new inode, the idea was to jump much farther
away.

It's also fair to say that we spent a lot more time thinking about the
Orlov algorithm which spreads new directories than we did how
find_group_other() would work. Note: Orlov algorithm was originally
implemented in the BSD FFS, and was then imported into ext2 by Al
Viro.

> I think we'd need to see some benchmarks ... Ted, any suggestions for
> something which might show a difference between these two approaches
> to hashing?

Trying to benchmarks different allocation algorithms are hard. The
problem is that allocation patterns are very different depending on
workloads, and it's not just one workload, but usually, there is a
combination of workloads resulting in different levels of free space
fragmentation. Not only the workloads or combination of workloads
matter, but the amount of free space can also make a huge difference.

When I worked on the changes for ext4's allocation algorithm, I used a
combination of measuring time to run various benchmarks and looking at
free space fragmentation information using e2freefrag and looking at
reports of file fragmentation by using "e2fsck -E fragcheck". I did
this after inducing various amounts of fragmentation, using various ad
hoc means; I should have done more to automate and to make this be
more controlled.

There is certainly much more that can be done to improve allocation
algorithms, for ext2 and ext4. With ext2, though, you give up so much
performance from the use of indirect blocks, that it's probably not
worth trying to change the inode and block allocation algorithms for
ext2.

We do use the same "quadratic hash" algorithm in ext4, so it's
something that works better is worth some thought. One of the
problems is that depending on the number of blocks in the file system,
using a pure quadratic probing can cause duplicate probes to the same
buckets. Unlike in a hash table, the number block groups isn't
necessarily going to be prime, or a pure power of two, in which case
we could use something like this:

((i + i*i)/2) % n

So if we use a quadratic hash, we might want to artificially limit the
number of probes, but at the same time, we might want to make sure we
try some buckets that are fairly far away if the first couple of
probes don't succeed. But that's kind of what an exponential probing
scheme does, which is what we have. So it's not entirely clear trying
to use a quadratic hash would actually be a better approach.

- Ted

2017-07-31 01:40:32

by Sean Anderson

[permalink] [raw]
Subject: Re: [ext2] Mislabeled quadratic probing?

> The biggest danger I see here is that we're only going to test 32
> groups before falling back to linear probing (we'll shift the single
> '1' bit out of 'i' in 32 steps). That might be a performance problem,
> but it should hit quite rarely.

Actually, because the loop condition is `i < ngroups`, we test even less
than that . For example, on a 1T disk with 4K blocks there are 8192
block groups, and the maximum iterations are 13. It seems then that
simply creating a directory and creating at least 15 files and writing
at least 128M between each would cause a fall back to a linear search.
This doesn't seem all that uncommon of use case: any directory over 2GB
or so with many files would be susceptible.


Attachments:
signature.asc (618.00 B)
OpenPGP digital signature