2014-01-07 08:28:52

by Ric Wheeler

[permalink] [raw]
Subject: Re: status of block-integrity

On 12/23/2013 09:35 PM, Martin K. Petersen wrote:
>>>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
> Christoph> We have the block integrity code to support DIF/DIX in the
> Christoph> the tree for about 5 and a half years, and we still don't
> Christoph> have a single consumer of it.
>
> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
> then integrity protection is active from the block layer down. The only
> code that's not currently being exercised are the tag interleaving
> functions. I was hoping the FS people would use them for back pointers
> but nobody seemed to bite.
>
>
> Christoph> Given that we'll have a lot of work to do in this area with
> Christoph> block multiqueue I think it's time to either kill it off for
> Christoph> good or make sure we can actually use and test it.
>
> I don't understand why multiqueue would require a lot of work? It's just
> an extra scatterlist per request.
>
> And obviously, if there's anything that needs to be done in this area
> I'll be happy to do so...
>

One of the major knocks on linux file systems (except for btrfs) that I hear is
the lack of full data path checksums. DIF/DIX + xfs or ext4 done right will give
us another answer here. I don't think it will be common, it is a request that
comes in for very large storage customers most commonly.

We do have devices that support this and are working to get more vendor testing
done, so I would hate to see us throw out the code instead of fixing it up for
the end users that see value here.

I think that we can get this working & agree with the call to continue this
discussion (here and at LSF :))

Ric


2014-01-07 13:33:23

by Hannes Reinecke

[permalink] [raw]
Subject: Re: status of block-integrity

On 01/07/2014 09:28 AM, Ric Wheeler wrote:
> On 12/23/2013 09:35 PM, Martin K. Petersen wrote:
>>>>>>> "Christoph" == Christoph Hellwig <[email protected]> writes:
>> Christoph> We have the block integrity code to support DIF/DIX in the
>> Christoph> the tree for about 5 and a half years, and we still don't
>> Christoph> have a single consumer of it.
>>
>> What do you mean? If you have a DIX-capable HBA (lpfc, qla2xxx, zfcp)
>> then integrity protection is active from the block layer down. The
>> only
>> code that's not currently being exercised are the tag interleaving
>> functions. I was hoping the FS people would use them for back
>> pointers
>> but nobody seemed to bite.
>>
>>
>> Christoph> Given that we'll have a lot of work to do in this area
>> with
>> Christoph> block multiqueue I think it's time to either kill it
>> off for
>> Christoph> good or make sure we can actually use and test it.
>>
>> I don't understand why multiqueue would require a lot of work?
>> It's just
>> an extra scatterlist per request.
>>
>> And obviously, if there's anything that needs to be done in this area
>> I'll be happy to do so...
>>
>
> One of the major knocks on linux file systems (except for btrfs)
> that I hear is the lack of full data path checksums. DIF/DIX + xfs
> or ext4 done right will give us another answer here. I don't think
> it will be common, it is a request that comes in for very large
> storage customers most commonly.
>
> We do have devices that support this and are working to get more
> vendor testing done, so I would hate to see us throw out the code
> instead of fixing it up for the end users that see value here.
>
> I think that we can get this working & agree with the call to
> continue this discussion (here and at LSF :))
>
I would indeed like to have a discussion at LSF about the future of
DIX. DIF is not an issue, as most HBAs support it already and we
actually need it for proper connectivity.

DIX, OTOH, has been left dormant since time immemorial, with the
only known (supposed) user being Oracle.
(I actually talked to the DB/2 folks about it, and the response
was a polite feigned interest ...)

We need to come up with a concise story here (either integrate with
filesystems or have a userland interface), otherwise it's just dead
code and indeed should be removed.

Plus so far I've had exactly _one_ request for DIX, and even that
came from a company which has its own custom storage array firmware.
Making me wonder if DIF/DIX is really that important or more of an
tick-mark during procuring ...

Even so, it would warrant a discussion at LSF.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
[email protected] +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: J. Hawn, J. Guild, F. Imend?rffer, HRB 16746 (AG N?rnberg)

2014-01-07 23:34:23

by Matthew Wilcox

[permalink] [raw]
Subject: Re: status of block-integrity

On Tue, Jan 07, 2014 at 02:33:10PM +0100, Hannes Reinecke wrote:
> I would indeed like to have a discussion at LSF about the future of
> DIX. DIF is not an issue, as most HBAs support it already and we
> actually need it for proper connectivity.
>
> DIX, OTOH, has been left dormant since time immemorial, with the
> only known (supposed) user being Oracle.
> (I actually talked to the DB/2 folks about it, and the response
> was a polite feigned interest ...)

I think there's a terminology confusion here; you seem to be using DIX
to mean the TCP CRC and DIF to mean T10DIF. I've seen other people use
DIX to mean separate SGLs for metadata and DIF to mean interleaved data.
Can you confirm which thing you mean here?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2014-01-08 00:05:21

by James Bottomley

[permalink] [raw]
Subject: Re: status of block-integrity

On Tue, 2014-01-07 at 16:34 -0700, Matthew Wilcox wrote:
> On Tue, Jan 07, 2014 at 02:33:10PM +0100, Hannes Reinecke wrote:
> > I would indeed like to have a discussion at LSF about the future of
> > DIX. DIF is not an issue, as most HBAs support it already and we
> > actually need it for proper connectivity.
> >
> > DIX, OTOH, has been left dormant since time immemorial, with the
> > only known (supposed) user being Oracle.
> > (I actually talked to the DB/2 folks about it, and the response
> > was a polite feigned interest ...)
>
> I think there's a terminology confusion here; you seem to be using DIX
> to mean the TCP CRC and DIF to mean T10DIF. I've seen other people use
> DIX to mean separate SGLs for metadata and DIF to mean interleaved data.
> Can you confirm which thing you mean here?

No, I think you're confusing algorithms with protocols. DIF and DIX are
two names for protection envelopes. DIF verifies integrity from the HBA
to the device surface. DIX verifies integrity from an application to
the HBA. Both DIF and DIX have pluggable checksum algorithms (and, in
theory, as long as the HBA does the conversion, they don't have to use
the same one, although the confusion over protection types and
algorithms is so dense already that the only way not to go insane is to
use the same end to end one). Oracle has the best data sources to
explain this, including Martin's slides:

https://oss.oracle.com/projects/data-integrity/documentation/

The specific problem is that there's no defined interface for any
application to use DIX easily because it has to supply additional
protection information when it reads or writes data and there's no
agreed way to extend read/write to do this and, as Martin has said,
thinging about trying to do this with mmap leads to a "bonghit bonanza".

So, the question is do we need to bother with DIX at all? No filesystem
uses it and there seems to be weak user demand at best. We could just
strip DIX, losing the protection envelope from the application to the
HBA but keeping DIF, which is the protection envelope from the HBA to
the device.

James

2014-01-08 15:43:47

by Martin K. Petersen

[permalink] [raw]
Subject: Re: status of block-integrity

>>>>> "James" == James Bottomley <[email protected]> writes:

James> No, I think you're confusing algorithms with protocols. DIF and
James> DIX are two names for protection envelopes. DIF verifies
James> integrity from the HBA to the device surface. DIX verifies
James> integrity from an application to the HBA.

Actually, DIX is a data integrity-aware HBA programming interface. We
have an implementation of that interface in the SCSI layer and in some
of the initiator drivers (lpfc, qla2xxx, mptNsas).

There is no single name for stuff above DIX. Other than "block layer
data integrity goo", "page cache black magic" and "let's add a few
fields to struct iocb".

James> So, the question is do we need to bother with DIX at all? No
James> filesystem uses it

...explicitly. Every filesystem uses it implicitly. There are only two
reasons for filesystems to want to be explicitly "block layer data
integrity goo"-aware:

1. To be able to use the application tag space for back pointers or
other metadata without requiring disk format changes.

2. To facilitate passthrough of protection information submitted
via the $TBD application programming interface.

I was hoping the extN folks would be interested in (1) but there were no
takers. (2) is hard but not forgotten. In any case the status quo is
that there is no point in filesystems manually generating protection
information when the block layer is going to do it for them when the bio
is submitted.

--
Martin K. Petersen Oracle Linux Engineering