Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967497AbeAORYC (ORCPT + 1 other); Mon, 15 Jan 2018 12:24:02 -0500 Received: from out01.mta.xmission.com ([166.70.13.231]:49411 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933132AbeAORX6 (ORCPT ); Mon, 15 Jan 2018 12:23:58 -0500 From: ebiederm@xmission.com (Eric W. Biederman) To: Dave Martin Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, Arnd Bergmann , Nicolas Pitre , Tony Lindgren , Catalin Marinas , Tyler Baicar , Will Deacon , Oleg Nesterov , James Morse , Olof Johansson , Santosh Shilimkar , linux-arm-kernel@lists.infradead.org, Al Viro References: <87373b6ghs.fsf@xmission.com> <20180112005940.23279-7-ebiederm@xmission.com> <20180115163028.GU22781@e103592.cambridge.arm.com> Date: Mon, 15 Jan 2018 11:23:03 -0600 In-Reply-To: <20180115163028.GU22781@e103592.cambridge.arm.com> (Dave Martin's message of "Mon, 15 Jan 2018 16:30:29 +0000") Message-ID: <87h8rnox3c.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1eb8UD-0005zH-C2;;;mid=<87h8rnox3c.fsf@xmission.com>;;;hst=in02.mta.xmission.com;;;ip=97.121.73.102;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1/xNueZ1pUN/msg4SubiCPG17DnweRBH4o= X-SA-Exim-Connect-IP: 97.121.73.102 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH 07/11] signal/arm64: Document conflicts with SI_USER and SIGFPE, SIGTRAP, SIGBUS X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Dave Martin writes: > On Thu, Jan 11, 2018 at 06:59:36PM -0600, Eric W. Biederman wrote: >> Setting si_code to 0 results in a userspace seeing an si_code of 0. >> This is the same si_code as SI_USER. Posix and common sense requires >> that SI_USER not be a signal specific si_code. As such this use of 0 >> for the si_code is a pretty horribly broken ABI. > > I think this situation may have come about because 0 is used as a > padding value for "impossible" cases -- i.e., things that can't happen > unless the kernel is broken, or things that are too unrecoverable for > clean error reporting to be helpful. > > In general, I think these values are not expected to reach userspace in > practice. > > This is not an excuse though -- and not 100% true -- so it's certainly > worthy of cleanup. > > > It would be good to approach this similarly for arm and arm64, since > the arm64 fault code is derived from arm. In this case the fault_info is something I have only seen on arm64. I have been approaching all architectures the same way. If there is insufficient information without architecture expertise to fix this class of error I have been ading FPE_FIXME to them. >> Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a >> value of __SI_KILL and now sees a value of SIL_KILL with the result >> that uid and pid fields are copied and which might copying the si_addr >> field by accident but certainly not by design. Making this a very >> flakey implementation. >> >> Utilizing FPE_FIXME, BUS_FIXME, TRAP_FIXME siginfo_layout will now return >> SIL_FAULT and the appropriate fields will be reliably copied. >> >> But folks this is a new and unique kind of bad. This is massively >> untested code bad. This is inventing new and unique was to get >> siginfo wrong bad. This is don't even think about Posix or what >> siginfo means bad. This is lots of eyeballs all missing the fact >> that the code does the wrong thing bad. This is getting stuck >> and keep making the same mistake bad. >> >> I really hope we can find a non userspace breaking fix for this on a >> port as new as arm64. > >> Possible ABI fixes include: >> - Send the signal without siginfo >> - Don't generate a signal > > The above two sould like ABI breaks? They are ways I have seen code on other platforms deal with not information to generate siginfo. Sending the signal without siginfo is roughly equivalent to your send SIGKILL suggestion below. A good example of that is code that calls force_sigsegv. Calling "force_sig(SIGBUS, current);" is perfectly valid. And then the parent when it reaped the process would have a little more information to go on when guessing what happened to the process. >> - Possibly assign and use an appropriate si_code >> - Don't handle cases which can't happen > > I think a mixture of these two is the best approach. > > In any case, si_code == 0 here doesn't seem to have any explicit meaning. > I think we can translate all of the arm64 faults to proper si_codes -- > see my sketch below. Probably means a bit more thought though. Yes I would be very happy to see that. > The only counterargument would be if there is software relying on > these bogus signal cases getting si_code == 0 for a useful purpose. > > The main reason I see to check for SI_USER is to allow a process to > filter out spurious signals (say, an asynchronous I/O signal for > which si_value would be garbage), and to print out diagnostics > before (in the case of a well-behaved program) resetting the signal > to SIG_DFL and killing itself to report the signal to the waiter. > > Daemons may be more discerning about who is allowed to signal them, > but overloading SIGBUS (say) as an IPC channel sounds like a very odd > thing to do. The same probably applies to any signal that has > nontrivial metadata. Agreed. Although I have seen ltp test cases that do crazy things like that. > Have you found software that is impacted by this in practice? No. I don't expect many userspace applications look at siginfo and everything I have found is some rare hard to trigger non-x86 case which limits the exposure to userspace applications tremendously. The angle I am coming at all of this from is that the linux kernel code that filled out out struct siginfo was not comprehensible or correct. Internal to the kernel it was using a magic value (not exportable to userspace) in the upper bits of si_code. That was causing problems for signal injection and converting signals from 32bit to 64bit, and from 64bit to 32bit. So I wrote kernel/signal.c:siginfo_layout() to figure out which fields of struct siginfo should be sent to userspace. In doing so I discovered that using 0 in si_code (aka SI_USER) is ambiguous, and problematic. Unfortuantely in most of the cases I have spotted using 0 in the si_code requires architectural knowledge that I don't currently have to sort out. So the best I can do is change si_code from 0 to FPE_FIXME/BUS_FIXME/TRAP_FIXME and bring the architecture maintainers attention to this area. One of the problems that results from all of this is that we copy unitialized data to userspace. I am slowly unifying and cleaning the code up so that the code is simple enough we can be certain we are not copying unitialized data to userspace. With si_coes of FPE_FIXME/BUS_FIXME/TRAP_FIXME I can at least attempt to keep the craziness from happening. My next step is to unify struct siginfo and struct compat_siginfo and the functions that copy them to userspace because there are very siginficant problems there. All of that said I like the way you are thinking about fixing these issues. > [...] > >> +++ b/arch/arm64/kernel/fpsimd.c >> @@ -867,7 +867,7 @@ asmlinkage void do_fpsimd_acc(unsigned int esr, struct pt_regs *regs) >> asmlinkage void do_fpsimd_exc(unsigned int esr, struct pt_regs *regs) >> { >> siginfo_t info; >> - unsigned int si_code = 0; >> + unsigned int si_code = FPE_FIXME; >> >> if (esr & FPEXC_IOF) >> si_code = FPE_FLTINV; > > This 0 can happen for vector operations where the implementation may > not be able to report exactly what happened, for example where > the implementer didn't want to pay the cost of tracking exactly > what went wrong in each lane. > > However, the FPEXC_* bits can be garbage in such a case rather > than being all zero: we should be checking the TFV bit in the ESR here. > This may be a bug. > > Perhaps FPE_FLTINV should be returned in si_code for such cases: it's > not otherwise used on arm64 -- invalid instructions would be reported as > SIGILL/ILL_ILLOPC instead). > > Otherwise, we might want to define a new code or arbitrarily pick > one of the existing FLT_* since this is really a more benign condition > than executing an illegal instruction. Alternatively, treat the > fault as spurious and suppress it, but that doesn't feel right either. I would love to see this sorted out. There is a very similar pattern on several different architectures. I suspect if we have a clean solution on one architecture the other architectures will be able to use that solution as well. >> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c >> index 9b7f89df49db..abe200587334 100644 >> --- a/arch/arm64/mm/fault.c >> +++ b/arch/arm64/mm/fault.c >> @@ -596,7 +596,7 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> >> info.si_signo = SIGBUS; >> info.si_errno = 0; >> - info.si_code = 0; >> + info.si_code = BUS_FIXME; > > Probably BUS_OBJERR. > >> if (esr & ESR_ELx_FnV) >> info.si_addr = NULL; >> else >> @@ -607,70 +607,70 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs) >> } >> >> static const struct fault_info fault_info[] = { >> - { do_bad, SIGBUS, 0, "ttbr address size fault" }, >> - { do_bad, SIGBUS, 0, "level 1 address size fault" }, >> - { do_bad, SIGBUS, 0, "level 2 address size fault" }, >> - { do_bad, SIGBUS, 0, "level 3 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "ttbr address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 1 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 2 address size fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "level 3 address size fault" }, > > Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). > > [...] > >> - { do_bad, SIGBUS, 0, "unknown 8" }, >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 8" }, > > [...] > >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 12" }, > > Not architected, so they could mean absolutely anything. If they > can happen at all, they are probably unsafe to ignore. > > -> SIGKILL, or panic(). > > Similary for all the "unknown" codes in the table, which I omit for > brevity. > >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous external abort" }, > > This si_code seems to be a fallback for if ACPI is absent or doesn't > know what to do with this error. > > -> SIGBUS/BUS_OBJERR? > > Can probably legitimately happen for userspace for suitable MMIO mappings. > > Perhaps it's more serious though in the presence of ACPI. Do we expect > that ACPI can diagnose all localisable errors? > >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 (translation table walk)" }, >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 (translation table walk)" }, > > Pagetable screwup or kernel/system/CPU bug -> SIGKILL, or panic(). > >> + { do_sea, SIGBUS, BUS_FIXME, "synchronous parity or ECC error" }, // Reserved when RAS is implemented > > Possibly SIGBUS/BUS_MCEERR_AR (though I don't know exactly what > userspace is supposed to do with this or whether this implies the > existence or certain kernel features for managing the error that > may not be present on arm64...) > > Otherwise, SIGKILL. Yes. The AR Action Required and AO Action optional bits I don't quite understand. But BUS_MCEERR_AR does sound like a good fit. >> + { do_sea, SIGBUS, BUS_FIXME, "level 0 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 1 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 2 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented >> + { do_sea, SIGBUS, BUS_FIXME, "level 3 synchronous parity error (translation table walk)" }, // Reserved when RAS is implemented > > Process page tables corrupt: if the kernel couldn't fix this, the > process can't reasonably fix it -> SIGKILL > > Since this is a RAS-type error it could be triggered by a cosmic ray > rather than requiring a kernel or system bug or other major failure, so > we probably shouldn't panic the system if the error is localisable to a > particular process. > >> { do_alignment_fault, SIGBUS, BUS_ADRALN, "alignment fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "TLB conflict abort" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "Unsupported atomic hardware update fault" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (lockdown abort)" }, > > Userspace shouldn't have access to lockdown: kernel/system bug > -> SIGKILL or panic(). > >> + { do_bad, SIGBUS, BUS_FIXME, "implementation fault (unsupported exclusive)" }, > > If running on an implementation where this fault can happen in response to an exclusive load/store issued by userspace may fail somewhere in the memory system, this should probably be SIGBUS/BUS_OBJERR (or possibly a new BUS_* code). > > This one may need to be hardware-dependent, if this fault can mean > something different depending on the hardware (I'm gussing this > possibility from "implementation" -- I've not checked the docs.) > >> + { do_bad, SIGBUS, BUS_FIXME, "section domain fault" }, >> + { do_bad, SIGBUS, BUS_FIXME, "page domain fault" }, > > Broken kernel, kernel memory corruption, CPU/system bug etc.: > SIGKILL or panic(). > >> }; >> >> int handle_guest_sea(phys_addr_t addr, unsigned int esr) >> @@ -739,11 +739,11 @@ static struct fault_info __refdata debug_fault_info[] = { >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 3" }, >> + { do_bad, SIGTRAP, TRAP_FIXME, "aarch32 vector catch" }, >> + { do_bad, SIGBUS, BUS_FIXME, "unknown 7" }, >> }; > > Impossible (?), or meaning unknown. > SIGKILL/panic() for these? Or possibly (since these are probably well > localised errors) SIGILL/ILL_ILLOPC. I like the way you are thinking on these, and I'd love to see them fixed. Eric