Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754236AbdL2Aa6 (ORCPT ); Thu, 28 Dec 2017 19:30:58 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:38217 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751253AbdL2Aa5 (ORCPT ); Thu, 28 Dec 2017 19:30:57 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: hpa@zytor.com Cc: Linus Torvalds , Alexandru Chirvasitu , Ingo Molnar , Thomas Gleixner , Andy Lutomirski , kernel list , Borislav Petkov , Brian Gerst , Denys Vlasenko , Josh Poimboeuf , Peter Zijlstra , Steven Rostedt References: <87a7y25v1o.fsf@xmission.com> <2B58A32E-73ED-4D1C-9C12-28941B321FE4@zytor.com> Date: Thu, 28 Dec 2017 18:30:21 -0600 In-Reply-To: <2B58A32E-73ED-4D1C-9C12-28941B321FE4@zytor.com> (hpa@zytor.com's message of "Thu, 28 Dec 2017 15:23:32 -0800") Message-ID: <87efne4bqa.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eUiZa-0005j9-Av;;;mid=<87efne4bqa.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=67.3.133.177;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+awpS0GSe8BFU2Fsrjwu+Ew3qI0h6GhYs= X-SA-Exim-Connect-IP: 67.3.133.177 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.1 XMSolicitRefs_0 Weightloss drug * 0.0 T_TooManySym_03 6+ unique symbols in subject * 0.0 T_TooManySym_02 5+ unique symbols in subject * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;hpa@zytor.com X-Spam-Relay-Country: X-Spam-Timing: total 1220 ms - load_scoreonly_sql: 0.03 (0.0%), signal_user_changed: 2.8 (0.2%), b_tie_ro: 1.96 (0.2%), parse: 1.09 (0.1%), extract_message_metadata: 21 (1.8%), get_uri_detail_list: 4.8 (0.4%), tests_pri_-1000: 8 (0.6%), tests_pri_-950: 1.21 (0.1%), tests_pri_-900: 0.97 (0.1%), tests_pri_-400: 36 (2.9%), check_bayes: 35 (2.8%), b_tokenize: 13 (1.1%), b_tok_get_all: 12 (1.0%), b_comp_prob: 3.7 (0.3%), b_tok_touch_all: 3.8 (0.3%), b_finish: 0.68 (0.1%), tests_pri_0: 354 (29.0%), check_dkim_signature: 0.53 (0.0%), check_dkim_adsp: 2.4 (0.2%), tests_pri_500: 792 (64.9%), poll_dns_idle: 783 (64.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH] x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5999 Lines: 156 hpa@zytor.com writes: > On December 28, 2017 2:47:47 PM PST, ebiederm@xmission.com wrote: >>Linus Torvalds writes: >> >>> From: Linus Torvalds >>> Date: Wed, 27 Dec 2017 11:41:30 -0800 >>> Subject: [PATCH] x86-32: fix kexec with stack canary >>(CONFIG_CC_STACKPROTECTOR) >>> >>> Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned >>up >>> and unified the IDT invalidation that existed in a couple of places. >>It >>> changed no actual real code. >>> >>> Despite not changing any actual real code, it _did_ change code >>> generation: by implementing the common idt_invalidate() function in >>> archx86/kernel/idt.c, it made the use of the function in >>> arch/x86/kernel/machine_kexec_32.c be a real function call rather >>than >>> an (accidental) inlining of the function. >>> >>> That, in turn, exposed two issues: >>> >>> - in load_segments(), we had incorrectly reset all the segment >>> registers, which then made the stack canary load (which gcc does >>> using offset of %gs) cause a trap. Instead of %gs pointing to the >>> stack canary, it will be the normal zero-based kernel segment, and >>> the stack canary load will take a page fault at address 0x14. >>> >>> - to make this even harder to debug, we had invalidated the GDT just >>> before calling idt_invalidate(), which meant that the fault >>happened >>> with an invalid GDT, which in turn causes a triple fault and >>> immediate reboot. >>> >>> Fix this by >>> >>> (a) not reloading the special segments in load_segments(). We >>currently >>> don't do any percpu accesses (which would require %fs on x86-32) >>in >>> this area, but there's no reason to think that we might not want >>to >>> do them, and like %gs, it's pointless to break it. >>> >>> (b) doing idt_invalidate() before invalidating the GDT, to keep >>things >>> at least _slightly_ more debuggable for a bit longer. Without a >>> IDT, traps will not work. Without a GDT, traps also will not >>work, >>> but neither will any segment loads etc. So in a very real sense, >>> the GDT is even more core than the IDT. >>> >>> Reported-and-tested-by: Alexandru Chirvasitu >>> Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation") >>> Cc: Thomas Gleixner >>> Cc: Andy Lutomirski >>> Cc: Peter Anvin >>> Cc: Ingo Molnar >>> Signed-off-by: Linus Torvalds >>> --- >>> >>> I wrote "Reported-and-tested-by: Alexandru" because while this isn't >>> exactly the same patch as anything Alexandru tested, it's pretty >>close, >>> and I'm pretty sure this version will fix his issues too. >>> >>> I decided to try to just do the minimal changes: the GDT invalidation >>last >>> (because of the debugging) and _only_ removing the resetting of fs/gs >> >>> rather than removing load_segments() entirely. >>> >>> I think making idt_invalidate() be inline would be a good thing as >>well, >>> and I do think that all those "phys_to_virt(0)" things are garbage, >>but I >>> also think they are independent issues, so I didn't touch any of >>that. >>> >>> I'm assuming I'll get this patch back through the x86 tree, and will >>not >>> be applying it to my own git tree unless the x86 people ask me to. >>> >>> Comments? >> >>There is one significant problem with this patch. It changes the ABI >>that kexec provides to the next kernel. >> >>That ABI is that the segments will be set to a well defined value. >>That value is flat 32bit segments with a base address of 0. >> >>By removing %fs and %gs from load_segments they are now effectively >>random undefined values, to the next kernel. >> >>I don't know if anything actually cares. But if they do they are now >>broken. It is easy enough to preserve that invariant I don't see >>a point in risking potential breaking and looking to see if we have >>actually broken the ABI. >> >>It feels like this is something we should move into assembly rather >>than attempting to cater to the changing evironment of C code in the >>kernel. Or if not we need a big fat comment be very very careful >>this code is special. >> >>Eric >> >> >>> arch/x86/kernel/machine_kexec_32.c | 4 +--- >>> 1 file changed, 1 insertion(+), 3 deletions(-) >>> >>> diff --git a/arch/x86/kernel/machine_kexec_32.c >>b/arch/x86/kernel/machine_kexec_32.c >>> index 00bc751c861c..edfede768688 100644 >>> --- a/arch/x86/kernel/machine_kexec_32.c >>> +++ b/arch/x86/kernel/machine_kexec_32.c >>> @@ -48,8 +48,6 @@ static void load_segments(void) >>> "\tmovl $"STR(__KERNEL_DS)",%%eax\n" >>> "\tmovl %%eax,%%ds\n" >>> "\tmovl %%eax,%%es\n" >>> - "\tmovl %%eax,%%fs\n" >>> - "\tmovl %%eax,%%gs\n" >>> "\tmovl %%eax,%%ss\n" >>> : : : "eax", "memory"); >>> #undef STR >>> @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image) >>> * The gdt & idt are now invalid. >>> * If you want to load them you must set up your own idt & gdt. >>> */ >>> - set_gdt(phys_to_virt(0), 0); >>> idt_invalidate(phys_to_virt(0)); >>> + set_gdt(phys_to_virt(0), 0); >>> >>> /* now call it */ >>> image->start = relocate_kernel_ptr((unsigned long)image->head, > > The ABI the kernel requires on entry is also documented, and we should > stick to that. Wrong interface this, does not transfer directly to a linux kernel. This transfers to a shim that starts a linux kernel or something else. It is way past time to be having design discussions about what this interface should do. It is more than a decade old. > That being said, the bottom line is to just stop putting these kinds > of final handovers into C and just hope the compiler (or > tracing/debugging developers) doesn't randomly break at some thing. In general I agree. That makes the code a little less approachable, but it would seem to remove the chance of surprise interactions even more. Eric