2015-11-09 23:17:32

by Eric Sandeen

[permalink] [raw]
Subject: discard_zeroes_data questions

Hi Martin (and linux-ext4 list...)

tl;dr:

mke2fs today thinks that if discard_zeroes_data == 1 and a BLKDISCARD ioctl of the whole device succeeds, then we have guarantees that any blocks read back will be full of zero, and we don't need to initialize them to zero. Is this ok? (barring crappy hardware, that is).

slightly longer:

Does discard_zeroes_data == 1 mean that a discard *request* will guarantee zeroes on a read, or does it mean that a discard-request-which-actually-was-executed-and-not-ignored-as-just-a-hint will give us back zeroes on a read? (because UNMAP is a hint, right? I don't know about SATA trim ...)

I did see 7985090 sd: disable discard_zeroes_data for UNMAP - so I think that for v3.19+, on *scsi*, what e2fsprogs is doing is ok (now).

But I'm wondering about dm-thin and SATA, too, so trying to figure out what discard_zeroes_data really implies. That after a BLKDISCARD, a read *will* return zeros, or that it'll return zeros *iff* the hint is taken?

I hope that made sense, but I've been communicating badly today. ;)

Thanks,
-Eric


2015-11-09 23:39:00

by Darrick J. Wong

[permalink] [raw]
Subject: Re: discard_zeroes_data questions

On Mon, Nov 09, 2015 at 05:08:38PM -0600, Eric Sandeen wrote:
> Hi Martin (and linux-ext4 list...)
>
> tl;dr:
>
> mke2fs today thinks that if discard_zeroes_data == 1 and a BLKDISCARD ioctl
> of the whole device succeeds, then we have guarantees that any blocks read
> back will be full of zero, and we don't need to initialize them to zero. Is
> this ok? (barring crappy hardware, that is).
>
> slightly longer:
>
> Does discard_zeroes_data == 1 mean that a discard *request* will guarantee
> zeroes on a read, or does it mean that a
> discard-request-which-actually-was-executed-and-not-ignored-as-just-a-hint
> will give us back zeroes on a read? (because UNMAP is a hint, right? I
> don't know about SATA trim ...)
>
> I did see 7985090 sd: disable discard_zeroes_data for UNMAP - so I think that
> for v3.19+, on *scsi*, what e2fsprogs is doing is ok (now).
>
> But I'm wondering about dm-thin and SATA, too, so trying to figure out what
> discard_zeroes_data really implies. That after a BLKDISCARD, a read *will*
> return zeros, or that it'll return zeros *iff* the hint is taken?

Last winter I sent in a patch to invalidate the page cache after a discard:
https://marc.info/?l=linux-kernel&m=142249686225748&w=2

...because e2fsck gets confused it discards part of a d_z_d=1 device and gets
non-zeroed buffers back (from the page cache!) immediately after.

But it never went in. Should I resend it? Again? Jens never acted on it.

--D

> I hope that made sense, but I've been communicating badly today. ;)
>
> Thanks,
> -Eric
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-11-10 00:15:48

by Martin K. Petersen

[permalink] [raw]
Subject: Re: discard_zeroes_data questions

>>>>> "Eric" == Eric Sandeen <[email protected]> writes:

Eric,

Eric> Does discard_zeroes_data == 1 mean that a discard *request* will
Eric> guarantee zeroes on a read, or does it mean that a
Eric> discard-request-which-actually-was-executed-and-not-ignored-as-just-a-hint
Eric> will give us back zeroes on a read? (because UNMAP is a hint,
Eric> right? I don't know about SATA trim ...)

For SCSI we only set d_z_d if the device is using WRITE SAME which
provides hard guarantees (i.e. the device will physically write zeroes
to any blocks of a request that can not be successfully unmapped).

The SATA spec is full of fail but RAID controller vendors as well as
Microsoft require a device that reports DRAT/RZAT to do the right
thing. I.e. offer guarantees above and beyond what the spec can
provide.

We don't entirely trust the "Designed for Windows" sticker. So for d_z_d
to be set on a SATA device in Linux it must report DRAT/RZAT *and* be
explicitly whitelisted. I am not aware of any problems with the drives
we currently have enabled.

Eric> But I'm wondering about dm-thin and SATA, too, so trying to figure
Eric> out what discard_zeroes_data really implies. That after a
Eric> BLKDISCARD, a read *will* return zeros, or that it'll return zeros
Eric> *iff* the hint is taken?

Can't speak for dm-thin. But the intent is that discard_zeroes_data is a
hard guarantee and not a hint. So any stacked driver that sets it must
provide the right guarantees.

--
Martin K. Petersen Oracle Linux Engineering

2015-11-10 00:22:47

by Martin K. Petersen

[permalink] [raw]
Subject: Re: discard_zeroes_data questions

>>>>> "Darrick" == Darrick J Wong <[email protected]> writes:

Darrick> Last winter I sent in a patch to invalidate the page cache
Darrick> after a discard:
Darrick> https://marc.info/?l=linux-kernel&m=142249686225748&w=2

Darrick> ...because e2fsck gets confused it discards part of a d_z_d=1
Darrick> device and gets non-zeroed buffers back (from the page cache!)
Darrick> immediately after.

Darrick> But it never went in. Should I resend it?

Yes, please! I thought this had gone in...

--
Martin K. Petersen Oracle Linux Engineering