This came about as part of auditing prandom_u32() usage, but
this is a special case: sometimes the starting value comes
from prandom_u32, and sometimes it comes from a hash of a name.
Does the name hash algorithm have to be stable? In that case, this
change would alter it. But it appears to use s_hash_seed which
is regenerated on "e2fsck -D", so maybe changing it isn't a big deal.
Feedback needed.
Signed-off-by: George Spelvin <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: [email protected]
---
fs/ext4/ialloc.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 7db0c8814f2ec..a4ea89b3ed368 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -457,9 +457,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
grp = hinfo.hash;
} else
grp = prandom_u32();
- parent_group = (unsigned)grp % ngroups;
- for (i = 0; i < ngroups; i++) {
- g = (parent_group + i) % ngroups;
+ g = parent_group = reciprocal_scale(grp, ngroups);
+ for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
get_orlov_stats(sb, g, flex_size, &stats);
if (!stats.free_inodes)
continue;
--
2.26.0
On Mar 18, 2019, at 7:32 PM, George Spelvin <[email protected]> wrote:
>
> This came about as part of auditing prandom_u32() usage, but
> this is a special case: sometimes the starting value comes
> from prandom_u32, and sometimes it comes from a hash of a name.
>
> Does the name hash algorithm have to be stable? In that case, this
> change would alter it. But it appears to use s_hash_seed which
> is regenerated on "e2fsck -D", so maybe changing it isn't a big deal.
This function is only selecting a starting group when searching for
a place to allocate a directory. It does not need to be stable.
The use of the name hash was introduced in the following commit:
f157a4aa98a18bd3817a72bea90d48494e2586e7
Author: Theodore Ts'o <[email protected]>
AuthorDate: Sat Jun 13 11:09:42 2009 -0400
ext4: Use hash of topdir directory name for Orlov parent group
Instead of using a random number to determine the goal parent group
for Orlov top directories, use a hash of the directory name. This
allows for repeatable results when trying to benchmark filesystem
layout algorithms.
Signed-off-by: "Theodore Ts'o" <[email protected]>
So I think the current patch is fine. The for-loop construct of
using "++g == ngroups && (g = 0)" to wrap "g" around is new to me,
but looks correct.
Reviewed-by: Andreas Dilger <[email protected]>
> Signed-off-by: George Spelvin <[email protected]>
> Cc: "Theodore Ts'o" <[email protected]>
> Cc: Andreas Dilger <[email protected]>
> Cc: [email protected]
> ---
> fs/ext4/ialloc.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 7db0c8814f2ec..a4ea89b3ed368 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -457,9 +457,8 @@ static int find_group_orlov(struct super_block *sb, struct inode *parent,
> grp = hinfo.hash;
> } else
> grp = prandom_u32();
> - parent_group = (unsigned)grp % ngroups;
> - for (i = 0; i < ngroups; i++) {
> - g = (parent_group + i) % ngroups;
> + g = parent_group = reciprocal_scale(grp, ngroups);
> + for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
> get_orlov_stats(sb, g, flex_size, &stats);
> if (!stats.free_inodes)
> continue;
> --
> 2.26.0
>
Cheers, Andreas
On Sat, Mar 28, 2020 at 04:56:17PM -0600, Andreas Dilger wrote:
> On Mar 18, 2019, at 7:32 PM, George Spelvin <[email protected]> wrote:
>> Does the name hash algorithm have to be stable? In that case, this
>> change would alter it. But it appears to use s_hash_seed which
>> is regenerated on "e2fsck -D", so maybe changing it isn't a big deal.
>
> This function is only selecting a starting group when searching for
> a place to allocate a directory. It does not need to be stable.
>
> The use of the name hash was introduced in the following commit:
>
> f157a4aa98a18bd3817a72bea90d48494e2586e7
> Author: Theodore Ts'o <[email protected]>
> AuthorDate: Sat Jun 13 11:09:42 2009 -0400
>
> ext4: Use hash of topdir directory name for Orlov parent group
>
> Instead of using a random number to determine the goal parent group
> for Orlov top directories, use a hash of the directory name. This
> allows for repeatable results when trying to benchmark filesystem
> layout algorithms.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
>
> So I think the current patch is fine. The for-loop construct of
> using "++g == ngroups && (g = 0)" to wrap "g" around is new to me,
> but looks correct.
>
> Reviewed-by: Andreas Dilger <[email protected]>
Thank you. Standing back and looking from higher altitude, I missed
a second modulo at fallback_retry: which should be made consistent,
so I need a one re-spin.
Also, we could, if desired, eliminate the i variable entirely
using the fact that we have a copy of the starting position cached
in parent_group. I.e.
g = parent_group = reciprocal_scale(grp, ngroups);
- for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
+ do {
...
- }
+ if (++g == ngroups)
+ g = 0;
+ } while (g != parent_group);
Or perhaps the following would be simpler, replacing the modulo
with a conditional subtract:
- g = parent_group = reciprocal_scale(grp, ngroups);
+ parent_group = reciprocal_scale(grp, ngroups);
- for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
+ for (i = 0; i < ngroups; i++) {
+ g = parent_group + i;
+ if (g >= ngroups)
+ g -= ngroups;
The conditional branch starts out always false, and ends up always true,
but except for a few bobbles when it switches, branch prediction should
handle it very well.
Any preference?
(Seriously, thank you for a second set of eyes. This patch set
contains so many almost-identical changes that my eyes were glazing
over and I couldn't see bugs.)
On Mar 28, 2020, at 5:15 PM, George Spelvin <[email protected]> wrote:
>
> On Sat, Mar 28, 2020 at 04:56:17PM -0600, Andreas Dilger wrote:
>>
>> So I think the current patch is fine. The for-loop construct of
>> using "++g == ngroups && (g = 0)" to wrap "g" around is new to me,
>> but looks correct.
>>
>> Reviewed-by: Andreas Dilger <[email protected]>
>
> Thank you. Standing back and looking from higher altitude, I missed
> a second modulo at fallback_retry: which should be made consistent,
> so I need a one re-spin.
>
> Also, we could, if desired, eliminate the i variable entirely
> using the fact that we have a copy of the starting position cached
> in parent_group. I.e.
>
> g = parent_group = reciprocal_scale(grp, ngroups);
> - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
> + do {
> ...
> - }
> + if (++g == ngroups)
> + g = 0;
> + } while (g != parent_group);
>
> Or perhaps the following would be simpler, replacing the modulo
> with a conditional subtract:
>
> - g = parent_group = reciprocal_scale(grp, ngroups);
> + parent_group = reciprocal_scale(grp, ngroups);
> - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
> + for (i = 0; i < ngroups; i++) {
> + g = parent_group + i;
> + if (g >= ngroups)
> + g -= ngroups;
>
> The conditional branch starts out always false, and ends up always true,
> but except for a few bobbles when it switches, branch prediction should
> handle it very well.
>
> Any preference?
I was looking at whether we could use a for-loop without "i"? Something like:
for (g = parent_group + 1; g != parent_group; ++g >= ngroups && (g = 0))
The initial group is parent_group + 1, to avoid special-casing when the
initial parent_group = 0 (which would prevent the loop from terminating).
Cheers, Andreas
On Sat, Mar 28, 2020 at 06:10:11PM -0600, Andreas Dilger wrote:
> On Mar 28, 2020, at 5:15 PM, George Spelvin <[email protected]> wrote:
>> Also, we could, if desired, eliminate the i variable entirely
>> using the fact that we have a copy of the starting position cached
>> in parent_group. I.e.
>>
>> g = parent_group = reciprocal_scale(grp, ngroups);
>> - for (i = 0; i < ngroups; i++, ++g == ngroups && (g = 0)) {
>> + do {
>> ...
>> - }
>> + if (++g == ngroups)
>> + g = 0;
>> + } while (g != parent_group);
> I was looking at whether we could use a for-loop without "i"? Something like:
>
> for (g = parent_group + 1; g != parent_group; ++g >= ngroups && (g = 0))
>
> The initial group is parent_group + 1, to avoid special-casing when the
> initial parent_group = 0 (which would prevent the loop from terminating).
That's the first option I presented, above. Since a for() loop
tests before each iteration, if the counter is strictly modulo
ngroups, there's no way to execute the loop body more than ngroups-1
times.
That's why I changed to do{}while(), which has a minimum of 1 (it can't
handle ngroups == 0), but can mimic the current loop's execution
perfectly (no initial +1 offset).