On 2/23/24 09:42, John Groves wrote:
> One of the key requirements for famfs is that it service vma faults
> efficiently. Our metadata helps - the search order is n for n extents,
> and n is usually 1. But we can still observe gnarly lock contention
> in mm if PTE faults are happening. This commit introduces fault counters
> that can be enabled and read via /sys/fs/famfs/...
>
> These counters have proved useful in troubleshooting situations where
> PTE faults were happening instead of PMD. No performance impact when
> disabled.
This seems kinda wonky. Why does _this_ specific filesystem need its
own fault counters. Seems like something we'd want to do much more
generically, if it is needed at all.
Was the issue here just that vm_ops->fault() was getting called instead
of ->huge_fault()? Or something more subtle?
On 24/02/23 10:23AM, Dave Hansen wrote:
> On 2/23/24 09:42, John Groves wrote:
> > One of the key requirements for famfs is that it service vma faults
> > efficiently. Our metadata helps - the search order is n for n extents,
> > and n is usually 1. But we can still observe gnarly lock contention
> > in mm if PTE faults are happening. This commit introduces fault counters
> > that can be enabled and read via /sys/fs/famfs/...
> >
> > These counters have proved useful in troubleshooting situations where
> > PTE faults were happening instead of PMD. No performance impact when
> > disabled.
>
> This seems kinda wonky. Why does _this_ specific filesystem need its
> own fault counters. Seems like something we'd want to do much more
> generically, if it is needed at all.
>
> Was the issue here just that vm_ops->fault() was getting called instead
> of ->huge_fault()? Or something more subtle?
Thanks for your reply Dave!
First, I'm willing to pull the fault counters out if the brain trust doesn't
like them.
I put them in because we were running benchmarks of computational data
analytics and and noted that jobs took 3x as long on famfs as raw dax -
which indicated I was doing something wrong, because it should be equivalent
or very close.
The the solution was to call thp_get_unmapped_area() in
famfs_file_operations, and performance doesn't vary significantly from raw
dax now. Prior to that I wasn't making sure the mmap address was PMD aligned.
After that I wanted a way to be double-secret-certain that it was servicing
PMD faults as intended. Which it basically always is, so far. (The smoke
tests in user space check this.)
John