2010-06-01 13:18:22

by Martin K. Petersen

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

>>>>> "Christof" == Christof Schmitt <[email protected]> writes:

>> Yep, known bug. Page writeback locking is messed up for buffer_head
>> users. The extNfs folks volunteered to look into this a while back
>> but I don't think they have found the time yet.

Christof> Thanks for the info. This means that this bug appears with all
Christof> filesystems?

XFS and btrfs should be fine.

--
Martin K. Petersen Oracle Linux Engineering


2010-06-02 13:37:52

by Christof Schmitt

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Tue, Jun 01, 2010 at 09:16:35AM -0400, Martin K. Petersen wrote:
> >>>>> "Christof" == Christof Schmitt <[email protected]> writes:
>
> >> Yep, known bug. Page writeback locking is messed up for buffer_head
> >> users. The extNfs folks volunteered to look into this a while back
> >> but I don't think they have found the time yet.
>
> Christof> Thanks for the info. This means that this bug appears with all
> Christof> filesystems?
>
> XFS and btrfs should be fine.

XFS looks good in my test, thanks for the hint. I am going to use XFS
for anything related to DIF for now. It would be nice to have a
solution that works for all filesystems, but it looks like this will
take some time and work.

Christof

2010-06-02 23:20:45

by Dave Chinner

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Wed, Jun 02, 2010 at 03:37:48PM +0200, Christof Schmitt wrote:
> On Tue, Jun 01, 2010 at 09:16:35AM -0400, Martin K. Petersen wrote:
> > >>>>> "Christof" == Christof Schmitt <[email protected]> writes:
> >
> > >> Yep, known bug. Page writeback locking is messed up for buffer_head
> > >> users. The extNfs folks volunteered to look into this a while back
> > >> but I don't think they have found the time yet.
> >
> > Christof> Thanks for the info. This means that this bug appears with all
> > Christof> filesystems?
> >
> > XFS and btrfs should be fine.
>
> XFS looks good in my test, thanks for the hint. I am going to use XFS
> for anything related to DIF for now. It would be nice to have a
> solution that works for all filesystems, but it looks like this will
> take some time and work.

If you are running DIF hardware, then XFS is only OK for direct IO.
XFS will still get torn writes if you are overwriting buffered data
(either by write() or mmap()) because there are no interlocks to
prevent cached pages under writeback from being modified while DMA
is being performed.....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-06-04 01:35:32

by Martin K. Petersen

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

>>>>> "Dave" == Dave Chinner <[email protected]> writes:

Dave> If you are running DIF hardware, then XFS is only OK for direct
Dave> IO. XFS will still get torn writes if you are overwriting
Dave> buffered data (either by write() or mmap()) because there are no
Dave> interlocks to prevent cached pages under writeback from being
Dave> modified while DMA is being performed.....

Didn't you use to wait_on_page_writeback() in page_mkwrite()?

--
Martin K. Petersen Oracle Linux Engineering

2010-06-04 02:33:13

by Dave Chinner

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Thu, Jun 03, 2010 at 09:34:18PM -0400, Martin K. Petersen wrote:
> >>>>> "Dave" == Dave Chinner <[email protected]> writes:
>
> Dave> If you are running DIF hardware, then XFS is only OK for direct
> Dave> IO. XFS will still get torn writes if you are overwriting
> Dave> buffered data (either by write() or mmap()) because there are no
> Dave> interlocks to prevent cached pages under writeback from being
> Dave> modified while DMA is being performed.....
>
> Didn't you use to wait_on_page_writeback() in page_mkwrite()?

The generic implementation of ->page_mkwrite (block_page_mkwrite())
which XFS uses has never had a wait_on_page_writeback() call in it.
There's no call in the generic write paths, either, hence my comment
that only direct IO on XFS will work.

It should be noted that metadata updates in XFS are already
protected against torn writes - buffers are held locked during IO,
and can only be modified while holding the lock. Hence the only
problem that needs solving for XFS to make full use of DIF/DIX is
getting the page cache paths to not modify pages under IO...


Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-06-07 16:21:32

by Martin K. Petersen

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

>>>>> "Dave" == Dave Chinner <[email protected]> writes:

>> Didn't you use to wait_on_page_writeback() in page_mkwrite()?

Dave> The generic implementation of ->page_mkwrite
Dave> (block_page_mkwrite()) which XFS uses has never had a
Dave> wait_on_page_writeback() call in it. There's no call in the
Dave> generic write paths, either, hence my comment that only direct IO
Dave> on XFS will work.

I guess that wait_on_page_writeback() was something I added when I used
XFS for DIF testing.

--
Martin K. Petersen Oracle Linux Engineering

2010-06-07 17:22:34

by Boaz Harrosh

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On 06/07/2010 07:20 PM, Martin K. Petersen wrote:
>>>>>> "Dave" == Dave Chinner <[email protected]> writes:
>
>>> Didn't you use to wait_on_page_writeback() in page_mkwrite()?
>
> Dave> The generic implementation of ->page_mkwrite
> Dave> (block_page_mkwrite()) which XFS uses has never had a
> Dave> wait_on_page_writeback() call in it. There's no call in the
> Dave> generic write paths, either, hence my comment that only direct IO
> Dave> on XFS will work.
>
> I guess that wait_on_page_writeback() was something I added when I used
> XFS for DIF testing.
>

Do you remember some performance numbers that show degradation / sameness?

What type of work loads?

Thanks
Boaz

2010-06-07 17:40:43

by Martin K. Petersen

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

>>>>> "Boaz" == Boaz Harrosh <[email protected]> writes:

Boaz> Do you remember some performance numbers that show degradation /
Boaz> sameness?

Boaz> What type of work loads?

I haven't been using XFS much for over a year. I'm using an internal
async I/O tool and btrfs for most of my DIX/DIF testing these days.

But my original changes were along the lines of what Jan mentioned
earlier (hooking into page_mkwrite and waiting for writeback. I could
have sworn that I only did it for ext[23] and that XFS waited out of the
box but git proves me wrong). Anyway, I'll try to get some benchmarking
happening later this week.

This won't fix things completely, though. ext2fs, for instance,
frequently changes metadata buffers in flight so it trips the guard
check in no time.

--
Martin K. Petersen Oracle Linux Engineering

2010-06-08 07:15:39

by Christof Schmitt

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Mon, Jun 07, 2010 at 01:40:21PM -0400, Martin K. Petersen wrote:
> >>>>> "Boaz" == Boaz Harrosh <[email protected]> writes:
>
> Boaz> Do you remember some performance numbers that show degradation /
> Boaz> sameness?
>
> Boaz> What type of work loads?
>
> I haven't been using XFS much for over a year. I'm using an internal
> async I/O tool and btrfs for most of my DIX/DIF testing these days.
>
> But my original changes were along the lines of what Jan mentioned
> earlier (hooking into page_mkwrite and waiting for writeback. I could
> have sworn that I only did it for ext[23] and that XFS waited out of the
> box but git proves me wrong). Anyway, I'll try to get some benchmarking
> happening later this week.

Is there a patch with this change available somewhere? It might be
useful to patch a kernel with this XFS change for reliable DIF/DIX
testing.

> This won't fix things completely, though. ext2fs, for instance,
> frequently changes metadata buffers in flight so it trips the guard
> check in no time.
>
> --
> Martin K. Petersen Oracle Linux Engineering
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-06-08 08:47:25

by Dave Chinner

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Tue, Jun 08, 2010 at 09:15:35AM +0200, Christof Schmitt wrote:
> On Mon, Jun 07, 2010 at 01:40:21PM -0400, Martin K. Petersen wrote:
> > >>>>> "Boaz" == Boaz Harrosh <[email protected]> writes:
> >
> > Boaz> Do you remember some performance numbers that show degradation /
> > Boaz> sameness?
> >
> > Boaz> What type of work loads?
> >
> > I haven't been using XFS much for over a year. I'm using an internal
> > async I/O tool and btrfs for most of my DIX/DIF testing these days.
> >
> > But my original changes were along the lines of what Jan mentioned
> > earlier (hooking into page_mkwrite and waiting for writeback. I could
> > have sworn that I only did it for ext[23] and that XFS waited out of the
> > box but git proves me wrong). Anyway, I'll try to get some benchmarking
> > happening later this week.
>
> Is there a patch with this change available somewhere? It might be
> useful to patch a kernel with this XFS change for reliable DIF/DIX
> testing.

Adding a wait_on_page_writeback() call to page_mkwrite() won't help
by itself - you need to unmap the pages after transitioning them
from dirty to writeback but before the hardware starts processing
the IO if you want to lock out mmap writes this way....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-06-08 08:52:08

by Nick Piggin

[permalink] [raw]
Subject: Re: Wrong DIF guard tag on ext2 write

On Tue, Jun 08, 2010 at 06:47:18PM +1000, Dave Chinner wrote:
> On Tue, Jun 08, 2010 at 09:15:35AM +0200, Christof Schmitt wrote:
> > > But my original changes were along the lines of what Jan mentioned
> > > earlier (hooking into page_mkwrite and waiting for writeback. I could
> > > have sworn that I only did it for ext[23] and that XFS waited out of the
> > > box but git proves me wrong). Anyway, I'll try to get some benchmarking
> > > happening later this week.
> >
> > Is there a patch with this change available somewhere? It might be
> > useful to patch a kernel with this XFS change for reliable DIF/DIX
> > testing.
>
> Adding a wait_on_page_writeback() call to page_mkwrite() won't help
> by itself - you need to unmap the pages after transitioning them
> from dirty to writeback but before the hardware starts processing
> the IO if you want to lock out mmap writes this way....

Actually as Jan pointed out, clear_page_dirty_for_io already does
this. So it seems pretty doable. get_user_pages remains a problem
(which I believe NFS just ignores) but it could be avoided by
just disallowing it.