From: Dmitry Monakhov Subject: Re: [PATCH] ext4: fix race aio-dio vs freeze_fs Date: Tue, 24 Nov 2015 20:55:40 +0400 Message-ID: References: <1448294568-20892-1-git-send-email-dmonakhov@openvz.org> <20151124132421.GG25232@quack.suse.cz> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2813359735081620562==" Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Dmitriy Monakhov , tytso@mit.edu, xfs@oss.sgi.com To: Jan Kara Return-path: In-Reply-To: <20151124132421.GG25232@quack.suse.cz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com List-Id: linux-ext4.vger.kernel.org --===============2813359735081620562== Content-Type: multipart/alternative; boundary=001a1140f5563c548005254c36a4 --001a1140f5563c548005254c36a4 Content-Type: text/plain; charset=UTF-8 On Nov 24, 2015 16:25, "Jan Kara" wrote: > > On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote: > > After freeze_fs was revoked (from Jan Kara) pages's write-back completion > > is deffered before unwritten conversion, so explicit flush_unwritten_io() > > was removed here: c724585b62411 > > But we still may face deferred conversion for aio-dio case > > # Trivial testcase > > for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & > > fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ > > --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite > > NOTE: Sane testcase should be integrated to xfstests, but it requires > > changes in common/* code, so let's use this this test at the moment. > > > > In order to fix this race we have to guard journal transaction with explicit > > sb_{start,end}_intwrite() as we do with ext4_evict_inode here:8e8ad8a5 > > Well, this problem seems to suggest that we have the freeze protection for > AIO writes wrong. We should call file_end_write() from aio_complete() and > not from aio_run_iocb()... Yep. It was my first attempt to fix that issue, but unfortunately this trick will break lockdep. Caller will do file_start_write and exit to userspace. Lockdep treats such behaviour as bug (return to userspace with a lock held) There are two way to fix that 1) add specific 'long' lock primitive to lockdep 2) let sync_filesystems to wait pended aio-dio > I believe XFS and other filesystems may have > problems with this as well (CCed). Attached patch (so far only compile > tested since my test machine is pondering on something else) should fix > this. > > Honza > > -- > Jan Kara > SUSE Labs, CR --001a1140f5563c548005254c36a4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Nov 24, 2015 16:25, "Jan Kara" <jack@suse.cz> wrote:
>
> On Mon 23-11-15 20:02:48, Dmitry Monakhov wrote:
> > After freeze_fs was revoked (from Jan Kara) pages's write-bac= k completion
> > is deffered before unwritten conversion, so explicit flush_unwrit= ten_io()
> > was removed here: c724585b62411
> > But we still may face deferred conversion for aio-dio case
> > # Trivial testcase
> > for ((i=3D0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -= u /mnt;done &
> > fio --bs=3D4k --ioengine=3Dlibaio --iodepth=3D128 --size=3D1g --d= irect=3D1 \
> >=C2=A0 =C2=A0 =C2=A0--runtime=3D60 --filename=3D/mnt/file --name= =3Drand-write --rw=3Drandwrite
> > NOTE: Sane testcase should be integrated to xfstests, but it requ= ires
> > changes in common/* code, so let's use this this test at the = moment.
> >
> > In order to fix this race we have to guard journal transaction wi= th explicit
> > sb_{start,end}_intwrite()=C2=A0 as we do with ext4_evict_inode he= re:8e8ad8a5
>
> Well, this problem seems to suggest that we have the freeze protection= for
> AIO writes wrong. We should call file_end_write() from aio_complete() = and
> not from aio_run_iocb()...
Yep. It was my first attempt to fix that issue, but=C2=A0 unfortunately thi= s trick will break lockdep. Caller will do file_start_write and exit to use= rspace. Lockdep treats such behaviour as bug (return to userspace with a lo= ck held)

There are two way to fix that
1) add specific 'long' lock primitive to lockdep
2) let sync_filesystems to wait pended aio-dio

> I believe XFS and other filesystems may have
> problems with this as well (CCed). Attached patch (so far only compile=
> tested since my test machine is pondering on something else) should fi= x
> this.
>
> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 Honza
>
> --
> Jan Kara <jack@suse.com> > SUSE Labs, CR

--001a1140f5563c548005254c36a4-- --===============2813359735081620562== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============2813359735081620562==--