2013-12-10 05:03:23

by Jitesh Shah

[permalink] [raw]
Subject: Reproducible block structure

Hello all,
Assuming I have the latest ext4 system, I have a question ->
Scenario -> I have 5 HDDs on which I burn the same block image on.
These HDDs are usually mounted RO when used in production. For
upgrading, I connect each one to an isolated machine, mount it as RW
and run a script to update files (basically add, delete, modify
files).

Now, if the script is ran in the SAME way for all 5 HDDs, is it
guaranteed that these HDDs will be same at the block level too? (i.e.
block allocation/deallocation will follow the same pattern). Assume
single-core system with only one process modifying the HDD in
predetermined order.

If there is no way to provide such a guarantee in a default ext4
install, is there an ext4 option (ordered?) which can provide this
guarantee?

Why do I ask -> I am tinkering with the idea of block level
verification of images. If the above guarantees can be provided, I can
easily hash the raw HDD for verification purposes.

Thanks,
Jitesh

PS: I am not subscribed to the mailing list, so please include my id
in the response if your email client does not put it there already!


2013-12-10 13:32:13

by Carlos Maiolino

[permalink] [raw]
Subject: Re: Reproducible block structure

On Mon, Dec 09, 2013 at 09:03:23PM -0800, Jitesh Shah wrote:
> Hello all,
> Assuming I have the latest ext4 system, I have a question ->
> Scenario -> I have 5 HDDs on which I burn the same block image on.
> These HDDs are usually mounted RO when used in production. For
> upgrading, I connect each one to an isolated machine, mount it as RW
> and run a script to update files (basically add, delete, modify
> files).
>
> Now, if the script is ran in the SAME way for all 5 HDDs, is it
> guaranteed that these HDDs will be same at the block level too? (i.e.
> block allocation/deallocation will follow the same pattern). Assume
> single-core system with only one process modifying the HDD in
> predetermined order.
>
No, you can't assume it, although ext4 might do the same write patterns, the
caches behind the scenes might change the way you'll be writing the data there.

> If there is no way to provide such a guarantee in a default ext4
> install, is there an ext4 option (ordered?) which can provide this
> guarantee?
>
ordered option for journal will guarantee journal ordering but not what you're
aiming for.

> Why do I ask -> I am tinkering with the idea of block level
> verification of images. If the above guarantees can be provided, I can
> easily hash the raw HDD for verification purposes.
>

I'm not sure if you'll be able to do it, different HDDs, even from the same
vendor might have slightly differences which might change the write patterns.
There are also some performance improvements like delalloc and mballoc that
might not have the same allocation result in all your HDDs.

Maybe (although I'm not the best person to ensure it), you can have success on
that if you disable HDDs write caches and use a filesystem less optimized and
with no journal, like ext2, or maybe use synchronous writes with ext4 but even
with sync writes, I'm not sure if you'll be able to have exactly the same
situation on all hdds, due the journal and the dynamic allocation algorithms

> Thanks,
> Jitesh
>
> PS: I am not subscribed to the mailing list, so please include my id
> in the response if your email client does not put it there already!

Please, subscribe to maillist. This help to keep the talk in the maillist and
avoid to take a look if somebody already answered you
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Carlos

2013-12-10 13:43:01

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Reproducible block structure

On Mon, Dec 09, 2013 at 09:03:23PM -0800, Jitesh Shah wrote:
>
> Now, if the script is ran in the SAME way for all 5 HDDs, is it
> guaranteed that these HDDs will be same at the block level too? (i.e.
> block allocation/deallocation will follow the same pattern). Assume
> single-core system with only one process modifying the HDD in
> predetermined order.

Nope, there's no way to guarantee this. There are a few places where
the algorithms are non-deterministic by design. It would be possible
to make some changes to guarantee this, but I'm not sure it's really
worth it --- in real life, assuming a single core system with a single
process which is also single threaded is generally not a realisstic
scenario.

> Why do I ask -> I am tinkering with the idea of block level
> verification of images. If the above guarantees can be provided, I can
> easily hash the raw HDD for verification purposes.

If you want to do a block level verification of the image, why not
also do block level update of the image as well?

Regards,

- Ted

2013-12-10 18:08:06

by Jitesh Shah

[permalink] [raw]
Subject: Re: Reproducible block structure

.. inline ..

> > Now, if the script is ran in the SAME way for all 5 HDDs, is it
> > guaranteed that these HDDs will be same at the block level too? (i.e.
> > block allocation/deallocation will follow the same pattern). Assume
> > single-core system with only one process modifying the HDD in
> > predetermined order.
>
> Nope, there's no way to guarantee this.

Thanks. I was quite sure of this, but still thought it was a good idea
to ask in case I am missing an obscure detail.

> > Why do I ask -> I am tinkering with the idea of block level
> > verification of images. If the above guarantees can be provided, I can
> > easily hash the raw HDD for verification purposes.
>
> If you want to do a block level verification of the image, why not
> also do block level update of the image as well?

Yep. Thats the plan B. We had a bunch of utilities using file-based
approaches. I was trying to find a way to move them one-by-one. Looks
like moving to a block-based approach altogether is a more worthwhile
investment of time.

Thanks a lot for your responses Carlos and Ted.

Jitesh