As of Linux 6.10-rc the MM can swap out larger than page size chunks.
NFS has all code ready to handle this, but has a VM_BUG_ON that
triggers when this happens. Simply remove the VM_BUG_ON to fix this
use case.
Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/nfs/direct.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index bb2f583eb28bf1..90079ca134dd3c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -141,8 +141,6 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
{
ssize_t ret;
- VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
-
if (iov_iter_rw(iter) == READ)
ret = nfs_file_direct_read(iocb, iter, true);
else
--
2.43.0
On Fri, 2024-06-14 at 12:03 +0200, Christoph Hellwig wrote:
> As of Linux 6.10-rc the MM can swap out larger than page size chunks.
> NFS has all code ready to handle this, but has a VM_BUG_ON that
> triggers when this happens. Simply remove the VM_BUG_ON to fix this
> use case.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> fs/nfs/direct.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index bb2f583eb28bf1..90079ca134dd3c 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -141,8 +141,6 @@ int nfs_swap_rw(struct kiocb *iocb, struct
> iov_iter *iter)
> {
> ssize_t ret;
>
> - VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
> -
> if (iov_iter_rw(iter) == READ)
> ret = nfs_file_direct_read(iocb, iter, true);
> else
This definitely seems wrong in a large folio world.
Reviewed-by: Jeff Layton <[email protected]>
On Fri, 14 Jun 2024 12:03:25 +0200 Christoph Hellwig <[email protected]> wrote:
> As of Linux 6.10-rc the MM can swap out larger than page size chunks.
> NFS has all code ready to handle this, but has a VM_BUG_ON that
> triggers when this happens. Simply remove the VM_BUG_ON to fix this
> use case.
>
> ...
>
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -141,8 +141,6 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
> {
> ssize_t ret;
>
> - VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
> -
> if (iov_iter_rw(iter) == READ)
> ret = nfs_file_direct_read(iocb, iter, true);
> else
I'm thinking this should precede "mm: swap: entirely map large folios
found in swapcache", or be a part of it.
Barry/Chuanhua, any opinions?
On Sun, Jun 16, 2024 at 5:55 AM Andrew Morton <[email protected]> wrote:
>
> On Fri, 14 Jun 2024 12:03:25 +0200 Christoph Hellwig <[email protected]> wrote:
>
> > As of Linux 6.10-rc the MM can swap out larger than page size chunks.
> > NFS has all code ready to handle this, but has a VM_BUG_ON that
> > triggers when this happens. Simply remove the VM_BUG_ON to fix this
> > use case.
As I understand it, this isn't happening because we don't support
mTHP swapping out to a swapfile, whether it's on NFS or any
other filesystem.
static int scan_swap_map_slots(struct swap_info_struct *si,
unsigned char usage, int nr,
swp_entry_t slots[], int order)
{
...
if (order > 0) {
...
/*
* Swapfile is not block device or not using clusters so unable
* to allocate large entries.
*/
if (!(si->flags & SWP_BLKDEV) || !si->cluster_info)
return 0;
}
}
However, I'm pleased to see this patch, as we might need it someday.
> >
> > ...
> >
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -141,8 +141,6 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
> > {
> > ssize_t ret;
> >
> > - VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
> > -
> > if (iov_iter_rw(iter) == READ)
> > ret = nfs_file_direct_read(iocb, iter, true);
> > else
>
> I'm thinking this should precede "mm: swap: entirely map large folios
> found in swapcache", or be a part of it.
>
> Barry/Chuanhua, any opinions?
The patch “mm: swap: entirely map large folios found in swapcache” does not
perform any I/O operations; it only maps the accessed large folio within the
swapcache. Therefore, this patch can be considered independent.
BTW, we haven't supported swapping out mTHP to nfs swapfile.
So, this patch is future-proof and can serve as a precedent if we
want to extend mthp swapout/swpin to non-blkdev systems.
Please correct me if I'm wrong, Ryan.
Thanks
Barry
On Sun, Jun 16, 2024 at 12:16:10PM +1200, Barry Song wrote:
> As I understand it, this isn't happening because we don't support
> mTHP swapping out to a swapfile, whether it's on NFS or any
> other filesystem.
It does happen. The reason why I sent this patch is becaue I observed
the BUG_ON trigger on a trivial swap generation workload (usemem.c from
xfstests).
On Sun, Jun 16, 2024 at 4:54 PM Christoph Hellwig <[email protected]> wrote:
>
> On Sun, Jun 16, 2024 at 12:16:10PM +1200, Barry Song wrote:
> > As I understand it, this isn't happening because we don't support
> > mTHP swapping out to a swapfile, whether it's on NFS or any
> > other filesystem.
>
> It does happen. The reason why I sent this patch is becaue I observed
> the BUG_ON trigger on a trivial swap generation workload (usemem.c from
> xfstests).
This is quite unusual. Could you share your setup and backtrace? I'd
like to reproduce the issue, as the mm code only supports mTHP
swapout on block devices. What is your swap device or swap file?
Additionally, on what kind of filesystem is the executable file built
from usemem.c located?
>