LinuxLists.cc - Re: [RFC PATCH] dm-csum: A new device mapper target that checks data integrity

2009-06-26 07:26:47

Subject: Re: [RFC PATCH] dm-csum: A new device mapper target that checks data integrity

Hi Alberto,

+ * TODO: would it be better to have M1 and M2 apart, to improve the chances of
+ * recovery in case of a failure?
+ *

How do you guarantee consistency in case of any lost writes? Your
checksum might hold an updated value while your data block might be
lost or written to a wrong destination?

When implementing such integrity solutions, IMO, it is always
advisable to handle such error conditions else this might lead to
issues. Since, checksums are very tightly coupled with the data and
any misleading can be quite dangerous unlike parity which can be
recovered.

Calculate the data's CRC, and compare it to the one found in M1. If they
+ * match, the reading is successful. If not, compare it to the one found in
+ * M2. If they match, the reading is successful;

Also, I hope by M1 and M2 you refer to the entry for a particular
block in the respective IMD sector. What kind of mechanism do you use
to determine which is younger?
Is it the timestamp or some generation count?

I assume information is per_block_entry in the IMD sectors. Which I
don't see in your implementation?
*
+ * The imd structure consists of:
+ * - 16 bit CRC (CCITT) (big endian)
+ * - 16 bit flags (big endian)
+ * - 32 bit tag

Correct me If I am missing something.

On Fri, May 29, 2009 at 12:59 AM, Goswin von Brederlow<[email protected]> wrote:
> Now I need to get a new harddisk so I can test this without risking
> live data. :)
>
> MfG
> ? ? ? ?Goswin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

--
Regards,
Sandeep.

?To learn is to change. Education is a process that changes the learner.?

2009-06-26 08:50:40

by SandeepKsinha

[permalink] [raw]

Subject: Re: [RFC PATCH] dm-csum: A new device mapper target that checks data integrity

Hi,

I meant sectors and not blocks.

On Fri, Jun 26, 2009 at 12:56 PM, SandeepKsinha<[email protected]> wrote:
> Hi Alberto,
>
> + * TODO: would it be better to have M1 and M2 apart, to improve the chances of
> + * recovery in case of a failure?
> + *
>
> How do you guarantee consistency in case of any lost writes? Your
> checksum might hold an updated value while your data block might be
> lost or written to a wrong destination?
>

sector in place of block.

> When implementing such integrity solutions, IMO, it is always
> advisable to handle such error conditions else this might lead to
> issues. Since, checksums ?are very tightly coupled with the data and
> any misleading can be quite dangerous unlike parity which can be
> recovered.
>
> Calculate the data's CRC, and compare it to the one found in M1. If they
> + * ? ?match, the reading is successful. If not, compare it to the one found in
> + * ? ?M2. If they match, the reading is successful;
>
> Also, I hope by M1 and M2 you refer to the entry for a particular
> block in the respective IMD sector. What kind of mechanism do you use
> to determine which is younger?
> Is it the timestamp or some generation count?
>

for a particular sector and not block.

> I assume information is per_block_entry in the IMD sectors. Which I
> don't see in your implementation?
> ?*
> + * The imd structure consists of:
> + * ? - 16 bit CRC (CCITT) (big endian)
> + * ? - 16 bit flags (big endian)
> + * ? - 32 bit tag
>
> Correct me If I am missing something.
>
> On Fri, May 29, 2009 at 12:59 AM, Goswin von Brederlow<[email protected]> wrote:
>> Now I need to get a new harddisk so I can test this without risking
>> live data. :)
>>
>> MfG
>> ? ? ? ?Goswin
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Regards,
> Sandeep.
>
>
>
>
>
>
> ?To learn is to change. Education is a process that changes the learner.?
>

--
Regards,
Sandeep.

?To learn is to change. Education is a process that changes the learner.?

2009-06-26 22:38:38

by Alberto Bertogli

[permalink] [raw]

Subject: Re: [RFC PATCH] dm-csum: A new device mapper target that checks data integrity

On Fri, Jun 26, 2009 at 12:56:42PM +0530, SandeepKsinha wrote:
> Hi Alberto,
>
> + * TODO: would it be better to have M1 and M2 apart, to improve the chances of
> + * recovery in case of a failure?
> + *
>
> How do you guarantee consistency in case of any lost writes? Your
> checksum might hold an updated value while your data block might be
> lost or written to a wrong destination?

The module aims to detect corruption, not prevent or recover from it. If the
write bio returns an error, it should be properly handled and propagated to
the upper layers.

If it returns success but writes the wrong data to the disk, then there will
be a mismatch between the checksum and the data at the destination, which will
be detected when it is read.

If a write returns success but no write ever takes place on the disk, then
dm-csum (as it is now) will not detect it; although I'm not sure if that
qualifies as on-disk data corruption or is it a disk controller issue.

> When implementing such integrity solutions, IMO, it is always
> advisable to handle such error conditions else this might lead to
> issues. Since, checksums are very tightly coupled with the data and
> any misleading can be quite dangerous unlike parity which can be
> recovered.

The code considers the possibility of bios failing, and propagates this
information to the upper layers. If you see any part of the code that lacks
error handling, please let me know so I can fix it.

> Calculate the data's CRC, and compare it to the one found in M1. If they
> + * match, the reading is successful. If not, compare it to the one found in
> + * M2. If they match, the reading is successful;
>
> Also, I hope by M1 and M2 you refer to the entry for a particular
> block in the respective IMD sector. What kind of mechanism do you use

M1 and M2 refer to whole IMD sectors.

> to determine which is younger?
> Is it the timestamp or some generation count?
>
> I assume information is per_block_entry in the IMD sectors. Which I
> don't see in your implementation?

It's per IMD sector. More specifically, struct imd_sector_header's
last_updated contains the generation count for the entire IMD sector, which is
used to determine which one is younger for updating purposes.

On reads, both IMD sectors are loaded and CRCs are verified against both.

Thanks a lot,
Alberto

2009-06-26 22:53:03

by Alan

[permalink] [raw]

Subject: Re: [RFC PATCH] dm-csum: A new device mapper target that checks data integrity

> If it returns success but writes the wrong data to the disk, then there will
> be a mismatch between the checksum and the data at the destination, which will
> be detected when it is read.

(or to the wrong place on the disk which for pre SATA is actually
possibly more likely as the command transfers are not CRC protected on
the cable - just slower, while the data is CRC protected). SATA fixes
this.

> If a write returns success but no write ever takes place on the disk, then
> dm-csum (as it is now) will not detect it; although I'm not sure if that
> qualifies as on-disk data corruption or is it a disk controller issue.

Does it matter to the poor victim ? At this point you get into putting
mirrors on disks A & B with their integrity on the opposite pair so if
one forgets to do I/O hopefully both won't.

To be honest if you are really really paranoid you don't do link layer
checksumming anyway (which is what this is) you checksum in the apps using
the data set. That protects you against lots of error cases in the
memory/cache system and on network transmissions. On big clusters
crunching enormous amounts of data all those 1 in 10^lots bit error rates
become such that it's worth the effort.

> It's per IMD sector. More specifically, struct imd_sector_header's
> last_updated contains the generation count for the entire IMD sector, which is
> used to determine which one is younger for updating purposes.
>
> On reads, both IMD sectors are loaded and CRCs are verified against both.

Seems reasonably paranoid - drives will do things under you like commit
data to disk out in a different order to the commands unless the cache
gets flushed between them but the barrier code should do that bit.

Alan