Hi
The eMMC 5.1 spec defines cache "barrier" capability of the eMMC device
as defined in JESD84-B51
I was wondering if there were any downsides to replacing the
WRITE_FLUSH_FUA with the cache barrier?
I understand that REQ_FLUSH is used to ensure that the current cache be
flushed to prevent any reordering but I dont seem to be clear on why
REQ_FUA is used.
Can someone please help me understand this part?
But as far as I understand it ... the cache barriers can be used to
replace all the flush requests.
Please let me know if there is any downside to this ?
I know there there was a big decision in 2010
https://lwn.net/Articles/400541/
and http://lwn.net/Articles/399148/
to remove the software based barrier support... but with the hardware
supporting "barriers" is there a downside to using them to replace the
flushes?
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
On Tue, Sep 15, 2015 at 04:17:46PM -0700, Nikhilesh Reddy wrote:
>
> The eMMC 5.1 spec defines cache "barrier" capability of the eMMC device as
> defined in JESD84-B51
>
> I was wondering if there were any downsides to replacing the
> WRITE_FLUSH_FUA with the cache barrier?
>
> I understand that REQ_FLUSH is used to ensure that the current cache be
> flushed to prevent any reordering but I dont seem to be clear on why
> REQ_FUA is used.
> Can someone please help me understand this part?
>
> I know there there was a big decision in 2010
> https://lwn.net/Articles/400541/
> and http://lwn.net/Articles/399148/
> to remove the software based barrier support... but with the hardware
> supporting "barriers" is there a downside to using them to replace the
> flushes?
OK, so a couple of things here.
There is queuing happening at two different layers in the system;
once at the block device layer, and one at the storage device layer.
(Possibly more if you have a hardware RAID card, etc., but for this
discussion, what's important is the queuing which is happening inside
the kernel, and that which is happening below the kernel.
The transition in 2010 is referring to how we handle barriers at the
block device layer, and was inspired by the fact that at that time,
the vast majority of the storage devices only supported "cache flush"
at the storage layer, and a few devices would support FUA (Force Unit
Attention) requests. But it can support devices which have a true
cache barrier function.
So when we say REQ_FLUSH, what we mean is that the writes are flushed
from the block layer command queues to the storage device, and that
subsequent writes will not be reordered before the flush. Since most
devices don't support a cache barrier command, this is implemented in
practice as a FLUSH CACHE, but if the device supports cache barrier
command, that would be sufficient.
The FUA write command is the command that actually has temporal
meaning; the device is not supported to signal completion until that
particular write has been committed to stable store. And if you
combine that with a flush command, as in WRITE_FLUSH_FUA, then that
implies a cache barrier, followed by a write that should not return
until write (FUA), and all preceeding writes, have been committed to
stable store (implied by the cache barrier).
For devices that support a cache barrier, a REQ_FLUSH can be
implemented using a cache barrier. If the storage device does not
support a cache barrier, the much stronger FLUSH CACHE command will
also work, and in practice, that's what gets used in for most storage
devices today.
For devices that don't support a FUA write, this can be simulated
using the (overly strong) combination of a write followed by a FLUSH
CACHE command. (Note, due to regressions caused by buggy hardware,
the libata driver does not enable FUA by default. Interestingly,
apparently Windows 2012 and newer no longer tries to use FUA either;
maybe Microsoft has run into consumer-grade storage devices with
crappy firmware? That being said, if you are using SATA drives which
in a JBOD which is has a SAS expander, you *are* using FUA --- but
presumably people who are doing this are at bigger shops who can do
proper HDD validation and can lean on their storage vendors to make
sure any firmware bugs they find get fixed.)
So for ext4, when we do a journal commit, first we write the journal
blocks, then a REQ_FLUSH, and then we FUA write the commit block ---
which for commodity SATA drives, gets translated to write the journal
blocks, FLUSH CACHE, write the commit block, FLUSH CACHE.
If your storage device has support for a barrier command and FUA, then
this could also be translated to write the journal blocks, CACHE
BARRIER, FUA WRITE the commit block.
And of course if you don't have FUA support, but you do have the
barrier command, then this could also get translated to write the
journal blocks, CACHE BARRIER, write the commit block, FLUSH CACHE.
All of these scenarios should work just fine.
Hope this helps,
- Ted
On Sat 19 Sep 2015 08:42:48 PM PDT, Theodore Ts'o wrote:
> On Tue, Sep 15, 2015 at 04:17:46PM -0700, Nikhilesh Reddy wrote:
>>
>> The eMMC 5.1 spec defines cache "barrier" capability of the eMMC device as
>> defined in JESD84-B51
>>
>> I was wondering if there were any downsides to replacing the
>> WRITE_FLUSH_FUA with the cache barrier?
>>
>> I understand that REQ_FLUSH is used to ensure that the current cache be
>> flushed to prevent any reordering but I dont seem to be clear on why
>> REQ_FUA is used.
>> Can someone please help me understand this part?
>>
>> I know there there was a big decision in 2010
>> https://lwn.net/Articles/400541/
>> and http://lwn.net/Articles/399148/
>> to remove the software based barrier support... but with the hardware
>> supporting "barriers" is there a downside to using them to replace the
>> flushes?
>
> OK, so a couple of things here.
>
> There is queuing happening at two different layers in the system;
> once at the block device layer, and one at the storage device layer.
> (Possibly more if you have a hardware RAID card, etc., but for this
> discussion, what's important is the queuing which is happening inside
> the kernel, and that which is happening below the kernel.
>
> The transition in 2010 is referring to how we handle barriers at the
> block device layer, and was inspired by the fact that at that time,
> the vast majority of the storage devices only supported "cache flush"
> at the storage layer, and a few devices would support FUA (Force Unit
> Attention) requests. But it can support devices which have a true
> cache barrier function.
>
> So when we say REQ_FLUSH, what we mean is that the writes are flushed
> from the block layer command queues to the storage device, and that
> subsequent writes will not be reordered before the flush. Since most
> devices don't support a cache barrier command, this is implemented in
> practice as a FLUSH CACHE, but if the device supports cache barrier
> command, that would be sufficient.
>
> The FUA write command is the command that actually has temporal
> meaning; the device is not supported to signal completion until that
> particular write has been committed to stable store. And if you
> combine that with a flush command, as in WRITE_FLUSH_FUA, then that
> implies a cache barrier, followed by a write that should not return
> until write (FUA), and all preceeding writes, have been committed to
> stable store (implied by the cache barrier).
>
> For devices that support a cache barrier, a REQ_FLUSH can be
> implemented using a cache barrier. If the storage device does not
> support a cache barrier, the much stronger FLUSH CACHE command will
> also work, and in practice, that's what gets used in for most storage
> devices today.
>
> For devices that don't support a FUA write, this can be simulated
> using the (overly strong) combination of a write followed by a FLUSH
> CACHE command. (Note, due to regressions caused by buggy hardware,
> the libata driver does not enable FUA by default. Interestingly,
> apparently Windows 2012 and newer no longer tries to use FUA either;
> maybe Microsoft has run into consumer-grade storage devices with
> crappy firmware? That being said, if you are using SATA drives which
> in a JBOD which is has a SAS expander, you *are* using FUA --- but
> presumably people who are doing this are at bigger shops who can do
> proper HDD validation and can lean on their storage vendors to make
> sure any firmware bugs they find get fixed.)
>
> So for ext4, when we do a journal commit, first we write the journal
> blocks, then a REQ_FLUSH, and then we FUA write the commit block ---
> which for commodity SATA drives, gets translated to write the journal
> blocks, FLUSH CACHE, write the commit block, FLUSH CACHE.
>
> If your storage device has support for a barrier command and FUA, then
> this could also be translated to write the journal blocks, CACHE
> BARRIER, FUA WRITE the commit block.
>
> And of course if you don't have FUA support, but you do have the
> barrier command, then this could also get translated to write the
> journal blocks, CACHE BARRIER, write the commit block, FLUSH CACHE.
>
> All of these scenarios should work just fine.
>
> Hope this helps,
>
> - Ted
Thanks so much !!
This was really helpful!
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project.
Hi Ted
Can you please help point me to instructions to setup and run xfs tests
for ext4 to run on a local qemu installation from scratch?
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
On Thu, Oct 22, 2015 at 11:33:50PM -0700, Nikhilesh Reddy wrote:
> Hi Ted
>
> Can you please help point me to instructions to setup and run xfs tests for
> ext4 to run on a local qemu installation from scratch?
https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/quick-start?h=META
I haven't tried to make this work for qemu for arm (which I assume
you'd be more interested in), but if you do, please let me know. Also
note that I *do* have changes to build xfstests for android / bionic.
The build infrastructure is in the xfstests-bld git tree; however,
some of the changes to xfstests and xfsprogs haven't been accepted
upstream yet, but let me know if you are interested and I'll get you
the patches that haven't yet been accepted.
What's missing is the automation to talk to an Android device; I
ultimately fixed the bug I was chasing via other means.
(Unfortunately the USB-C device that was supposedly able to power a
Macbook Pro as well as connecting to a USB-attached SSD didn't work
against a Nexus 5X, and so I never finished getting xfstests running
on Android, although 95% of the work should be done.).
The two other missing pieces was getting upstream fio working on
Android/bionic (although there is a fio is the AOSP tree which should
work), and IIRC there were one or two fixup patches that I needed
against the bleeding-edge tip of coreutils so it would work with the
latest Android NDK. They were pretty obvious, but if you want I can
dig up the changes from my tree.
Finally, if you are doing x86-based development, you might be
interested in using Google Compute Engine to run your tests. I do
must of my testinng on GCE these days, beacuse it's much faster and I
can run multiple tests in parallel.
https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/README.GCE
Cheers,
- Ted
On Fri 23 Oct 2015 02:34:28 AM PDT, Theodore Ts'o wrote:
> On Thu, Oct 22, 2015 at 11:33:50PM -0700, Nikhilesh Reddy wrote:
>> Hi Ted
>>
>> Can you please help point me to instructions to setup and run xfs tests for
>> ext4 to run on a local qemu installation from scratch?
>
> https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/quick-start?h=META
>
> I haven't tried to make this work for qemu for arm (which I assume
> you'd be more interested in), but if you do, please let me know. Also
> note that I *do* have changes to build xfstests for android / bionic.
> The build infrastructure is in the xfstests-bld git tree; however,
> some of the changes to xfstests and xfsprogs haven't been accepted
> upstream yet, but let me know if you are interested and I'll get you
> the patches that haven't yet been accepted.
>
> What's missing is the automation to talk to an Android device; I
> ultimately fixed the bug I was chasing via other means.
> (Unfortunately the USB-C device that was supposedly able to power a
> Macbook Pro as well as connecting to a USB-attached SSD didn't work
> against a Nexus 5X, and so I never finished getting xfstests running
> on Android, although 95% of the work should be done.).
>
> The two other missing pieces was getting upstream fio working on
> Android/bionic (although there is a fio is the AOSP tree which should
> work), and IIRC there were one or two fixup patches that I needed
> against the bleeding-edge tip of coreutils so it would work with the
> latest Android NDK. They were pretty obvious, but if you want I can
> dig up the changes from my tree.
>
> Finally, if you are doing x86-based development, you might be
> interested in using Google Compute Engine to run your tests. I do
> must of my testinng on GCE these days, beacuse it's much faster and I
> can run multiple tests in parallel.
>
> https://git.kernel.org/cgit/fs/ext2/xfstests-bld.git/tree/kvm-xfstests/README.GCE
>
> Cheers,
>
> - Ted
Sorry for the delayed reply ... been crazy busy!
*Thanks so much ted ! you are really awesome!*
.. Yes i am trying to get these on both Android ( and run some kind of
automation) and an Arm Qemu for development.
I would be grateful if you can share the patches that fix the issues
that are known.... even if they are not fully functional ...
I can try to see if I can finish the automation part and send the
patches assuming i can finish them before you do :)
Yes i will look into GCE as well.
--
Thanks
Nikhilesh Reddy
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum,
a Linux Foundation Collaborative Project.