Ingo, Thomas, Al, any concerns with this series?
On Thu, May 3, 2018 at 5:06 PM, Dan Williams <[email protected]> wrote:
> Changes since v2 [1]:
>
> * Fix source address increment in mcsafe_handle_tail() (Mika)
>
> * Extend the unit test to inject simulated write faults and validate
> that data is properly transferred.
>
> * Rename MCSAFE_DEBUG to MCSAFE_TEST
>
> [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015583.html
>
> ---
>
> Currently memcpy_mcsafe() is only deployed in the pmem driver when
> reading through a /dev/pmemX block device. However, a filesystem in dax
> mode mounted on a /dev/pmemX block device will bypass the block layer
> and the driver for reads. The filesystem-dax (fsdax) read case uses
> dax_direct_access() and copy_to_iter() to bypass the block layer.
>
> The result of the bypass is that the kernel treats machine checks during
> read as system fatal (reboot) when they could simply be flagged as an
> I/O error, similar to performing reads through the pmem driver. Prevent
> this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> path.
>
> The main differences between this copy_to_user_mcsafe() and
> copy_user_generic_unrolled() are:
>
> * Typical tail/residue handling after a fault retries the copy
> byte-by-byte until the fault happens again. Re-triggering machine
> checks is potentially fatal so the implementation uses source alignment
> and poison alignment assumptions to avoid re-triggering machine
> checks.
>
> * SMAP coordination is handled external to the assembly with
> __uaccess_begin() and __uaccess_end().
>
> * ITER_KVEC and ITER_BVEC can now end prematurely with an error.
>
> The new MCSAFE_TEST facility is proposed as a way to unit test the
> exception handling without requiring an ACPI EINJ capable platform.
>
> ---
>
> Dan Williams (9):
> x86, memcpy_mcsafe: remove loop unrolling
> x86, memcpy_mcsafe: add labels for write fault handling
> x86, memcpy_mcsafe: return bytes remaining
> x86, memcpy_mcsafe: add write-protection-fault handling
> x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
> dax: introduce a ->copy_to_iter dax operation
> dax: report bytes remaining in dax_iomap_actor()
> pmem: switch to copy_to_iter_mcsafe()
> x86, nfit_test: unit test for memcpy_mcsafe()
>
>
> arch/x86/Kconfig | 1
> arch/x86/Kconfig.debug | 3 +
> arch/x86/include/asm/mcsafe_test.h | 75 ++++++++++++++++++++++++
> arch/x86/include/asm/string_64.h | 10 ++-
> arch/x86/include/asm/uaccess_64.h | 14 +++++
> arch/x86/lib/memcpy_64.S | 112 +++++++++++++++++-------------------
> arch/x86/lib/usercopy_64.c | 21 +++++++
> drivers/dax/super.c | 10 +++
> drivers/md/dm-linear.c | 16 +++++
> drivers/md/dm-log-writes.c | 15 +++++
> drivers/md/dm-stripe.c | 21 +++++++
> drivers/md/dm.c | 25 ++++++++
> drivers/nvdimm/claim.c | 3 +
> drivers/nvdimm/pmem.c | 13 +++-
> drivers/s390/block/dcssblk.c | 7 ++
> fs/dax.c | 21 ++++---
> include/linux/dax.h | 5 ++
> include/linux/device-mapper.h | 5 +-
> include/linux/string.h | 4 +
> include/linux/uio.h | 15 +++++
> lib/iov_iter.c | 61 ++++++++++++++++++++
> tools/testing/nvdimm/test/nfit.c | 104 +++++++++++++++++++++++++++++++++
> 22 files changed, 482 insertions(+), 79 deletions(-)
> create mode 100644 arch/x86/include/asm/mcsafe_test.h
> _______________________________________________
> Linux-nvdimm mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/linux-nvdimm
* Dan Williams <[email protected]> wrote:
> Ingo, Thomas, Al, any concerns with this series?
Yeah, so:
"[PATCH v3 0/9] Series short description"
... isn't the catchiest of titles to capture my [all too easily distracted]
attention! ;-)
I have marked it now for -tip processing. Linus was happy with this and acked the
approach, right?
Thanks,
Ingo
>
> On Thu, May 3, 2018 at 5:06 PM, Dan Williams <[email protected]> wrote:
> > Changes since v2 [1]:
> >
> > * Fix source address increment in mcsafe_handle_tail() (Mika)
> >
> > * Extend the unit test to inject simulated write faults and validate
> > that data is properly transferred.
> >
> > * Rename MCSAFE_DEBUG to MCSAFE_TEST
> >
> > [1]: https://lists.01.org/pipermail/linux-nvdimm/2018-May/015583.html
> >
> > ---
> >
> > Currently memcpy_mcsafe() is only deployed in the pmem driver when
> > reading through a /dev/pmemX block device. However, a filesystem in dax
> > mode mounted on a /dev/pmemX block device will bypass the block layer
> > and the driver for reads. The filesystem-dax (fsdax) read case uses
> > dax_direct_access() and copy_to_iter() to bypass the block layer.
> >
> > The result of the bypass is that the kernel treats machine checks during
> > read as system fatal (reboot) when they could simply be flagged as an
> > I/O error, similar to performing reads through the pmem driver. Prevent
> > this fatal condition by deploying memcpy_mcsafe() in the fsdax read
> > path.
> >
> > The main differences between this copy_to_user_mcsafe() and
> > copy_user_generic_unrolled() are:
> >
> > * Typical tail/residue handling after a fault retries the copy
> > byte-by-byte until the fault happens again. Re-triggering machine
> > checks is potentially fatal so the implementation uses source alignment
> > and poison alignment assumptions to avoid re-triggering machine
> > checks.
> >
> > * SMAP coordination is handled external to the assembly with
> > __uaccess_begin() and __uaccess_end().
> >
> > * ITER_KVEC and ITER_BVEC can now end prematurely with an error.
> >
> > The new MCSAFE_TEST facility is proposed as a way to unit test the
> > exception handling without requiring an ACPI EINJ capable platform.
> >
> > ---
> >
> > Dan Williams (9):
> > x86, memcpy_mcsafe: remove loop unrolling
> > x86, memcpy_mcsafe: add labels for write fault handling
> > x86, memcpy_mcsafe: return bytes remaining
> > x86, memcpy_mcsafe: add write-protection-fault handling
> > x86, memcpy_mcsafe: define copy_to_iter_mcsafe()
> > dax: introduce a ->copy_to_iter dax operation
> > dax: report bytes remaining in dax_iomap_actor()
> > pmem: switch to copy_to_iter_mcsafe()
> > x86, nfit_test: unit test for memcpy_mcsafe()
> >
> >
> > arch/x86/Kconfig | 1
> > arch/x86/Kconfig.debug | 3 +
> > arch/x86/include/asm/mcsafe_test.h | 75 ++++++++++++++++++++++++
> > arch/x86/include/asm/string_64.h | 10 ++-
> > arch/x86/include/asm/uaccess_64.h | 14 +++++
> > arch/x86/lib/memcpy_64.S | 112 +++++++++++++++++-------------------
> > arch/x86/lib/usercopy_64.c | 21 +++++++
> > drivers/dax/super.c | 10 +++
> > drivers/md/dm-linear.c | 16 +++++
> > drivers/md/dm-log-writes.c | 15 +++++
> > drivers/md/dm-stripe.c | 21 +++++++
> > drivers/md/dm.c | 25 ++++++++
> > drivers/nvdimm/claim.c | 3 +
> > drivers/nvdimm/pmem.c | 13 +++-
> > drivers/s390/block/dcssblk.c | 7 ++
> > fs/dax.c | 21 ++++---
> > include/linux/dax.h | 5 ++
> > include/linux/device-mapper.h | 5 +-
> > include/linux/string.h | 4 +
> > include/linux/uio.h | 15 +++++
> > lib/iov_iter.c | 61 ++++++++++++++++++++
> > tools/testing/nvdimm/test/nfit.c | 104 +++++++++++++++++++++++++++++++++
> > 22 files changed, 482 insertions(+), 79 deletions(-)
> > create mode 100644 arch/x86/include/asm/mcsafe_test.h
> > _______________________________________________
> > Linux-nvdimm mailing list
> > [email protected]
> > https://lists.01.org/mailman/listinfo/linux-nvdimm
On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <[email protected]> wrote:
>
> * Dan Williams <[email protected]> wrote:
>
>> Ingo, Thomas, Al, any concerns with this series?
>
> Yeah, so:
>
> "[PATCH v3 0/9] Series short description"
>
> ... isn't the catchiest of titles to capture my [all too easily distracted]
> attention! ;-)
My bad! After that mistake it became a toss-up between more spam and
hoping the distraction would not throw you off.
> I have marked it now for -tip processing. Linus was happy with this and acked the
> approach, right?
I think "happy" is a strong word when it comes to x86 machine check
handling. My interpretation is that he and Andy acquiesced that this
is about the best we can do with dax+mce as things stand today.
> On May 14, 2018, at 8:52 AM, Dan Williams <[email protected]> wrote:
>
>
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.
Agreed. I think it’s plausible that we could do slightly better for MCEs related to bad memory accessed through the direct map, but the code would be complicated and miserable to test, and it’s not even clear that it would materially decrease the chance that we survive as compared to these patches.
On Mon, May 14, 2018 at 8:52 AM Dan Williams <[email protected]>
wrote:
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.
Yeah. I definitely wouldn't say "happy", but perhaps "resigned". There's
nothing horrible going on, except for just plain nasty interfaces and
apparently disgusting hw problems wrt "rep movs/stos".
Linus
* Dan Williams <[email protected]> wrote:
> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <[email protected]> wrote:
> >
> > * Dan Williams <[email protected]> wrote:
> >
> >> Ingo, Thomas, Al, any concerns with this series?
> >
> > Yeah, so:
> >
> > "[PATCH v3 0/9] Series short description"
> >
> > ... isn't the catchiest of titles to capture my [all too easily distracted]
> > attention! ;-)
>
> My bad! After that mistake it became a toss-up between more spam and
> hoping the distraction would not throw you off.
>
> > I have marked it now for -tip processing. Linus was happy with this and acked the
> > approach, right?
>
> I think "happy" is a strong word when it comes to x86 machine check
> handling. My interpretation is that he and Andy acquiesced that this
> is about the best we can do with dax+mce as things stand today.
So, how would you like to go about this series?
To help move it forward I applied the first 5 commits to tip:x86/dax, on a
vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
superficially (I don't have DAX so this essentially means build tests) and
pushed out the result.
Barring some later generic-x86 regression (unlikely) this looks good to me - feel
free to cross-pull that branch into your DAX/nvdimm tree.
Or we could apply the remaining changes to -tip too - your call.
Thanks,
Ingo
On Mon, May 14, 2018 at 11:49 PM, Ingo Molnar <[email protected]> wrote:
>
> * Dan Williams <[email protected]> wrote:
>
>> On Mon, May 14, 2018 at 12:26 AM, Ingo Molnar <[email protected]> wrote:
>> >
>> > * Dan Williams <[email protected]> wrote:
>> >
>> >> Ingo, Thomas, Al, any concerns with this series?
>> >
>> > Yeah, so:
>> >
>> > "[PATCH v3 0/9] Series short description"
>> >
>> > ... isn't the catchiest of titles to capture my [all too easily distracted]
>> > attention! ;-)
>>
>> My bad! After that mistake it became a toss-up between more spam and
>> hoping the distraction would not throw you off.
>>
>> > I have marked it now for -tip processing. Linus was happy with this and acked the
>> > approach, right?
>>
>> I think "happy" is a strong word when it comes to x86 machine check
>> handling. My interpretation is that he and Andy acquiesced that this
>> is about the best we can do with dax+mce as things stand today.
>
> So, how would you like to go about this series?
>
> To help move it forward I applied the first 5 commits to tip:x86/dax, on a
> vanilla v4.17-rc5 base, did some minor edits to the changelogs, tested it
> superficially (I don't have DAX so this essentially means build tests) and
> pushed out the result.
Thanks for that. Technically speaking, you do have dax, but setting up
our unit tests is currently not friction free, so I would not expect
you to go through that effort. Hopefully we can revive 0day running
our unit tests one of these days.
> Barring some later generic-x86 regression (unlikely) this looks good to me - feel
> free to cross-pull that branch into your DAX/nvdimm tree.
>
> Or we could apply the remaining changes to -tip too - your call.
The remainder patches have developed a conflict with another topic
branch in the nvdimm tree, in particular "dax: introduce a
->copy_to_iter dax operation". I think the best course is for me to
rebase the remaining 4 on top of tip/x86/dax and carry the merge
conflict through the nvdimm tree.
> Thanks,
>
> Ingo
Thanks!