From: Cholerae Hu Subject: Re: A blocksize problem about dax and ext4 Date: Thu, 24 Dec 2015 08:34:45 +0800 Message-ID: References: <94D0CD8314A33A4D9D801C0FE68B40295BEC985F@G9W0745.americas.hpqcorp.net> <20151224000021.GU19802@dastard> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6367235618738791190==" Cc: Ted Tso , "linux-nvdimm@lists.01.org" , "linux-kernel@vger.kernel.org" , "xfs@oss.sgi.com" , "adilger.kernel@dilger.ca" , Dan Williams , "linux-ext4@vger.kernel.org" , "Elliott, Robert \(Persistent Memory\)" To: Dave Chinner Return-path: In-Reply-To: <20151224000021.GU19802@dastard> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com List-Id: linux-ext4.vger.kernel.org --===============6367235618738791190== Content-Type: multipart/alternative; boundary=001a114fa8ac61858805279a01a1 --001a114fa8ac61858805279a01a1 Content-Type: text/plain; charset=UTF-8 The block size is 1024. # dumpe2fs -h /dev/pmem0 | grep "Block size" dumpe2fs 1.42.13 (17-May-2015) Block size: 1024 I tried it out on xfs and I succeeded. There are the prompting messages: # mkfs.xfs -f -b size=1024 /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=32768 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1 data = bsize=1024 blocks=131072, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=1024 blocks=2571, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 # mount -o dax /dev/pmem0 /mnt/mem The mount command doesn't return any message, and I can successfully read or write files in /mnt/mem. 2015-12-24 8:00 GMT+08:00 Dave Chinner : > On Wed, Dec 23, 2015 at 09:18:05PM +0000, Elliott, Robert (Persistent > Memory) wrote: > > > > > -----Original Message----- > > > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On > Behalf Of > > > Dan Williams > > > Sent: Wednesday, December 23, 2015 11:16 AM > > > To: Cholerae Hu > > > Cc: linux-nvdimm@lists.01.org > > > Subject: Re: A blocksize problem about dax and ext4 > > > > > > On Wed, Dec 23, 2015 at 4:03 AM, Cholerae Hu > > > wrote: > > ... > > > > [root@localhost cholerae]# mount -o dax /dev/pmem0 /mnt/mem > > > > mount: wrong fs type, bad option, bad superblock on /dev/pmem0, > > > > missing codepage or helper program, or other error > > > > > > > > In some cases useful info is found in syslog - try > > > > dmesg | tail or so. > > > > [root@localhost cholerae]# dmesg | tail > > ... > > > > [ 81.779582] EXT4-fs (pmem0): error: unsupported blocksize for dax > > ... > > > > > What's the fs block size? For example: > > > # dumpe2fs -h /dev/pmem0 | grep "Block size" > > > dumpe2fs 1.42.9 (28-Dec-2013) > > > Block size: 4096 > > > Depending on the size of /dev/pmem0 it may have automatically set it > > > to a block size less than 4 KiB which is incompatible with "-o dax". > > > > I noticed a few things while trying that out on both ext4 and xfs. > > > > $ sudo mkfs.ext4 -F -b 1024 /dev/pmem0 > > $ sudo mount -o dax /dev/pmem0 /mnt/ext4-pmem0 > > $ sudo mkfs.xfs -f -b size=1024 /dev/pmem0 > > $ sudo mount -o dax /dev/pmem0 /mnt/xfs-pmem0 > > > > [ 199.679195] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use > at your own risk > > [ 199.724931] EXT4-fs (pmem0): error: unsupported block size 1024 for > dax > > [ 859.077766] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at > your own risk > > [ 859.118106] XFS (pmem0): Filesystem block size invalid for DAX > Turning DAX off. > > [ 859.156950] XFS (pmem0): Mounting V4 Filesystem > > [ 859.183626] XFS (pmem0): Ending clean mount > > > > 1. ext4 fails to mount the filesystem, while xfs just disables DAX. > > It seems like they should they be the same. > > I don't really care what is done to ext4 here, but I'm not changing > XFS behaviour. I'm expecting mixed dax/non-dax fileystems to be a > thing, with DAX turned on by an inode flag on disk. Indeed, I see > the mount option going away permanently for XFS, and DAX being > controlled completely from on-disk flags. E.g. ext4 encrypted files > need to turn off DAX, while clear text files can be accessed using > DAX. This should happen completely transparently to the user.... > > In the situation of block size < page size, there's things we can do > to ensure that XFS will allocate page size aligned/sized extents > (extent size hints FTW). This is the same mechanism that we'll use > to ensure that extents are aligned/sized for reliable huge page > mappings. Hence while DAX /as a global option/ needs to be turned > off for sub-page block size filesystems, there's no reason why we > can't turn DAX on for files that will always allocate blocks > according to DAX constraints. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > --001a114fa8ac61858805279a01a1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
The block size is 1024.
# dumpe2fs -h /dev/= pmem0 | grep "Block size"
dumpe2fs 1.42.13 (17-May-2015= )
Block size: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 10= 24

I tried it out on xfs and I succeeded. There ar= e the prompting messages:
# mkfs.xfs -f -b size=3D1024 /= dev/pmem0
meta-data=3D/dev/pmem0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 isize=3D512 =C2=A0 =C2=A0agcount=3D4, agsize=3D32768 blks
<= div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sectsz=3D512 =C2=A0 attr=3D2,= projid32bit=3D1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 crc=3D1 = =C2=A0 =C2=A0 =C2=A0 =C2=A0finobt=3D1
data =C2=A0 =C2=A0 =3D =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 b= size=3D1024 =C2=A0 blocks=3D131072, imaxpct=3D25
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0=3D =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 sunit=3D0 =C2=A0 =C2=A0 =C2=A0swidth=3D0 blks
naming =C2=A0 =3Dversion 2 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0bsize=3D4096 =C2=A0 ascii-ci=3D0 ftype=3D1
log =C2=A0 = =C2=A0 =C2=A0=3Dinternal log =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 bsize=3D102= 4 =C2=A0 blocks=3D2571, version=3D2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0=3D =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 sectsz=3D512 =C2=A0 sunit=3D0 blks, lazy-count=3D1
= realtime =3Dnone =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 extsz=3D4096 =C2=A0 blocks=3D0, rtextents=3D0
# moun= t -o dax /dev/pmem0 /mnt/mem

The mount command= doesn't return any message, and I can successfully read or write files= in /mnt/mem.

2015-12-24 8:00 GMT+08:00 Dave Chinner <david@fromorbit.com&= gt;:
On Wed, Dec 23, 2015 at 09:18:05PM +0000, Elliott, Robert (Persi= stent Memory) wrote:
>
> > -----Original Message-----
> > From: Linux-nvdimm [mailto:linux-nvdimm-bounces@lists.01.org] On Behalf Of
> > Dan Williams
> > Sent: Wednesday, December 23, 2015 11:16 AM
> > To: Cholerae Hu <chol= eraehyq@gmail.com>
> > Cc: linux-nvdimm@lis= ts.01.org
> > Subject: Re: A blocksize problem about dax and ext4
> >
> > On Wed, Dec 23, 2015 at 4:03 AM, Cholerae Hu <choleraehyq@gmail.com>
> > wrote:
> ...
> > > [root@localhost cholerae]# mount -o dax /dev/pmem0 /mnt/mem<= br> > > > mount: wrong fs type, bad option, bad superblock on /dev/pme= m0,
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 missing codepage or helper progra= m, or other error
> > >
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 In some cases useful info is foun= d in syslog - try
> > >=C2=A0 =C2=A0 =C2=A0 =C2=A0 dmesg | tail or so.
> > > [root@localhost cholerae]# dmesg | tail
> ...
> > > [=C2=A0 =C2=A081.779582] EXT4-fs (pmem0): error: unsupported= blocksize for dax
> ...
>
> > What's the fs block size?=C2=A0 For example:
> > # dumpe2fs -h /dev/pmem0=C2=A0 | grep "Block size"
> > dumpe2fs 1.42.9 (28-Dec-2013)
> > Block size:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A04096
> > Depending on the size of /dev/pmem0 it may have automatically set= it
> > to a block size less than 4 KiB which is incompatible with "= -o dax".
>
> I noticed a few things while trying that out on both ext4 and xfs.
>
> $ sudo mkfs.ext4 -F -b 1024 /dev/pmem0
> $ sudo mount -o dax /dev/pmem0 /mnt/ext4-pmem0
> $ sudo mkfs.xfs -f -b size=3D1024 /dev/pmem0
> $ sudo mount -o dax /dev/pmem0 /mnt/xfs-pmem0
>
> [=C2=A0 199.679195] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTA= L, use at your own risk
> [=C2=A0 199.724931] EXT4-fs (pmem0): error: unsupported block size 102= 4 for dax
> [=C2=A0 859.077766] XFS (pmem0): DAX enabled. Warning: EXPERIMENTAL, u= se at your own risk
> [=C2=A0 859.118106] XFS (pmem0): Filesystem block size invalid for DAX= Turning DAX off.
> [=C2=A0 859.156950] XFS (pmem0): Mounting V4 Filesystem
> [=C2=A0 859.183626] XFS (pmem0): Ending clean mount
>
> 1. ext4 fails to mount the filesystem, while xfs just disables DAX. > It seems like they should they be the same.

I don't really care what is done to ext4 here, but I'm = not changing
XFS behaviour. I'm expecting mixed dax/non-dax fileystems to be a
thing, with DAX turned on by an inode flag on disk. Indeed, I see
the mount option going away permanently for XFS, and DAX being
controlled completely from on-disk flags. E.g. ext4 encrypted files
need to turn off DAX, while clear text files can be accessed using
DAX. This should happen completely transparently to the user....

In the situation of block size < page size, there's things we can do=
to ensure that XFS will allocate page size aligned/sized extents
(extent size hints FTW). This is the same mechanism that we'll use
to ensure that extents are aligned/sized for reliable huge page
mappings. Hence while DAX /as a global option/ needs to be turned
off for sub-page block size filesystems, there's no reason why we
can't turn DAX on for files that will always allocate blocks
according to DAX constraints.

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

--001a114fa8ac61858805279a01a1-- --===============6367235618738791190== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============6367235618738791190==--