2023-11-17 04:10:28

by ChenXiaoSong

[permalink] [raw]
Subject: Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"

On 2023/10/30 22:56, Trond Myklebust wrote:
> A refactoring is by definition a change that does not affect code
> behaviour. It is obvious that this was never intended to be such a
> patch.
>
> The reason that the bug is occurring in 4.19.x, and not in the latest
> kernels, is because the former is missing another bugfix (one which
> actually is missing a "Fixes:" tag).
>
> Can you therefore please check if applying commit 22876f540bdf ("NFS:
> Don't call generic_error_remove_page() while holding locks") fixes the
> issue.
>
> Note that the latter patch is needed in any case in order to fix a read
> deadlock (as indicated on the label).
>
> Thanks,
> Trond

Sorry, the previous email had formatting issues. I'll resend it.


After applying commit 22876f540bdf ("NFS: Don't call
generic_error_remove_page() while holding locks"), I encountered an
issue of infinite loop:

write
  ...
  nfs_updatepage
    nfs_writepage_setup
      nfs_setup_write_request
        nfs_try_to_update_request
          nfs_wb_page
            if (clear_page_dirty_for_io(page)) // true
            nfs_writepage_locked // return 0
              nfs_do_writepage // return 0
                nfs_page_async_flush // return 0
                  nfs_error_is_fatal_on_server
                  nfs_write_error_remove_page
                    SetPageError // instead of generic_error_remove_page
            // loop begin
            if (clear_page_dirty_for_io(page)) // false
            if (!PagePrivate(page)) // false
            ret = nfs_commit_inode = 0
            // loop again, never quit


before applying commit 22876f540bdf ("NFS: Don't call
generic_error_remove_page() while holding locks"),
generic_error_remove_page() will clear PG_private, and infinite loop
will never happen:

generic_error_remove_page
  truncate_inode_page
    truncate_cleanup_page
      do_invalidatepage
        nfs_invalidate_page
          nfs_wb_page_cancel
            nfs_inode_remove_request
              ClearPagePrivate(head->wb_page)


If applying this patch, are other patches required? And I cannot
reproducethe read deadlock bug that the patch want to fix, are there
specific conditions required to reproduce this read deadlock bug?