This series is a major overhaul of the KAISER patches:
1) Entry code
Mostly the same, except for a handful of fixlets and delta
improvements folded into the corresponding patches
New: Map TSS read only into the user space visible mapping
This is 64bit only, as 32bit needs the TSS mapped RW
AMD confirmed that there is no issue with that. It would be nice to
get confirmation from Intel as well.
2) Namespace
Several people including Linus requested to change the KAISER name.
We came up with a list of technically correct acronyms:
User Address Space Separation, prefix uass_
Forcefully Unmap Complete Kernel With Interrupt Trampolines, prefix fuckwit_
but we are politically correct people so we settled for
Kernel Page Table Isolation, prefix kpti_
Linus, your call :)
3) The actual isolation patches
- Replaced the magic kaiser_add/remove_mapping() code by mapping everything
which needs to be shared with user space into the fixmap
- PMD aligned the shared fixmap so the PTE page can be shared between
user and kernel space page tables
- Integrated all fixes and Peters rewrite of the PCID/TLB flush code.
- Restructured the patch set in a way that it is simpler to review
- Got rid of the strange wording of shadow page tables, because they are
not shadowish at all. KASAN, virt etc. use shadows, but these tables
are actively in use and integral part of the functionality
- Moved the debugfs files into a new directory so they don't clutter the
debugfs root directory.
LIMITATIONS:
- allmod/yes config builds fail right now because the fixmap grows
too large and breaks the EFI assumptions. This is still investigated.
A possible solution is just to use one of the address space holes
and grab a separate pgdir to map the cpu entry area. Not hard to do
and it wont change much of the principle of these patches.
TODOs:
- This needs a thorough review again. Sorry.
- Please verify that all fixlets have been integrated. The mail threads
are horribly scattered so I might have missed something.
- Rewrite documentation. I dropped the documentation patch as it not
longer applies.
- Handle native vsyscalls. Right now the patch set supports only
emulation, but it should be possible to support native as well.
Nothing urgent, I'd rather prefer to kill them completely.
- Populate a branch with minimal prerequisite patches to apply.
Thanks to Andy Lutomirsky, Peter Zijlstra, Ingo Molnar, Borislav Petkov and
Dave Hansen for all the help with this.
The patches apply on top of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent
and are available from git in
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.x86/kpti
and as tarball from
https://tglx.de/~tglx/patches-kpti-119.tar.bz2
Signature file for the uncompressed tarball
https://tglx.de/~tglx/patches-kpti-119.tar.sig
Thanks,
tglx
On Mon, Dec 4, 2017 at 6:07 AM, Thomas Gleixner <[email protected]> wrote:
>
> Kernel Page Table Isolation, prefix kpti_
>
> Linus, your call :)
I think you probably chose the right name here. The alternatives sound
intriguing, but probably not the right thing to do.
How much of this is considered worth trying to integrate early?
Clearly I'm not taking all of it.
Linus
On Mon, 4 Dec 2017, Linus Torvalds wrote:
> On Mon, Dec 4, 2017 at 6:07 AM, Thomas Gleixner <[email protected]> wrote:
> >
> > Kernel Page Table Isolation, prefix kpti_
> >
> > Linus, your call :)
>
> I think you probably chose the right name here. The alternatives sound
> intriguing, but probably not the right thing to do.
>
> How much of this is considered worth trying to integrate early?
Probably the entry changes, but we need to sort out that fixmap issue first
and that affects the entry changes as well. Give me a day or two and I can
tell you.
> Clearly I'm not taking all of it.
I did not expect that.
Thanks,
tglx
On Mon, Dec 4, 2017 at 10:18 AM, Thomas Gleixner <[email protected]> wrote:
>>
>> How much of this is considered worth trying to integrate early?
>
> Probably the entry changes, but we need to sort out that fixmap issue first
> and that affects the entry changes as well. Give me a day or two and I can
> tell you.
Sure. I've skimmed through the patches, and a number of the early ones
seem to be "obviously safe and independently nice cleanups". Even the
sysenter stack setup etc that isn't really required without the other
work seems sane and fine.
In fact, I have to say that the patches themselves look very good.
Nothing made me go "Christ, what an ugly hack". Maybe that is because
of just the skimming through, but still, it was not an unpleasant
read-through.
The problem, of course, is how *subtle* all the interactions are, and
how one missed "oh, the CPU also needs this" makes for some really
nasty breakage. So it may all look nice and clean, and then blow up
horribly in some very particular configuration.
And yes, paravirtualization is evil.
Linus
On 12/04/2017 01:18 PM, Thomas Gleixner wrote:
> On Mon, 4 Dec 2017, Linus Torvalds wrote:
>> On Mon, Dec 4, 2017 at 6:07 AM, Thomas Gleixner <[email protected]> wrote:
>>> Kernel Page Table Isolation, prefix kpti_
>>>
>>> Linus, your call :)
>> I think you probably chose the right name here. The alternatives sound
>> intriguing, but probably not the right thing to do.
>>
>> How much of this is considered worth trying to integrate early?
> Probably the entry changes, but we need to sort out that fixmap issue first
> and that affects the entry changes as well. Give me a day or two and I can
> tell you.
This series breaks Xen PV.
When I tested it last time it was patch 17 (of this series). I don't
know whether it breaks now due to the same patch, I haven't had a chance
to look into this yet, sorry.
-boris
>> Clearly I'm not taking all of it.
> I did not expect that.
>
> Thanks,
>
> tglx
Random thought for the future: KPTI will make it possible to avoid
global IPI broadcasts on kernel flushes as we discussed, incorrectly,
two years ago at LPC. This could be nice.
On 12/05/2017 01:49 PM, Andy Lutomirski wrote:
> Random thought for the future: KPTI will make it possible to avoid
> global IPI broadcasts on kernel flushes as we discussed, incorrectly,
> two years ago at LPC. This could be nice.
I'm slow. How?
On Tue, Dec 5, 2017 at 1:57 PM, Dave Hansen <[email protected]> wrote:
> On 12/05/2017 01:49 PM, Andy Lutomirski wrote:
>> Random thought for the future: KPTI will make it possible to avoid
>> global IPI broadcasts on kernel flushes as we discussed, incorrectly,
>> two years ago at LPC. This could be nice.
>
> I'm slow. How?
>
By introducing an (optional) atomic check for need-to-flush on
switches from user CR3 to kernel CR3.
Should KPTI have a MAINTAINERS entry?
Neil Berrington (cc'ed) is reporting "Double fault in load_new_mm_cr3 with KPTI
enabled" at https://bugzilla.kernel.org/show_bug.cgi?id=198517
On 01/19/2018 12:56 PM, Andrew Morton wrote:
> Should KPTI have a MAINTAINERS entry?
>
> Neil Berrington (cc'ed) is reporting "Double fault in load_new_mm_cr3 with KPTI
> enabled" at https://bugzilla.kernel.org/show_bug.cgi?id=198517
Seems sane to me. There have been quite a few patches I wish I'd been
cc'd on along the way. I think Andy L in particular is probably way
under-cc'd on x86 stuff in general.
A better long-term solution (that others have suggested) is probably to
create an [email protected] or something.
On Fri, 19 Jan 2018, Andrew Morton wrote:
> Should KPTI have a MAINTAINERS entry?
I don't think so. It's all x86 core code which has a maintainer entry.
> Neil Berrington (cc'ed) is reporting "Double fault in load_new_mm_cr3 with KPTI
> enabled" at https://bugzilla.kernel.org/show_bug.cgi?id=198517
Neil, the screenshot shows that this is on a ubuntu 4.13 something
kernel. Can you reproduce on 4.14.14 or on Linus latest ?
Thanks,
tglx