Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750986AbbHTEA4 (ORCPT ); Thu, 20 Aug 2015 00:00:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:33383 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750765AbbHTEAy (ORCPT ); Thu, 20 Aug 2015 00:00:54 -0400 Date: Wed, 19 Aug 2015 23:00:50 -0500 From: Josh Poimboeuf To: Ingo Molnar Cc: Andrew Morton , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, linux-kernel@vger.kernel.org, Michal Marek , Peter Zijlstra , Andy Lutomirski , Borislav Petkov , Linus Torvalds , Andi Kleen , Pedro Alves , Namhyung Kim , Bernd Petrovitsch , Chris J Arges , live-patching@vger.kernel.org Subject: Re: [PATCH v10 03/20] x86/stackvalidate: Compile-time stack validation Message-ID: <20150820040050.GC2944@treble.redhat.com> References: <07bf51833b5e1c52bfd9a4dbda41b80c508dffff.1439521412.git.jpoimboe@redhat.com> <20150815002354.7fb2f21e.akpm@linux-foundation.org> <20150815124913.GB3254@treble.hsd1.ky.comcast.net> <20150819100138.GA10504@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150819100138.GA10504@gmail.com> User-Agent: Mutt/1.5.23.1-rc1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9698 Lines: 227 On Wed, Aug 19, 2015 at 12:01:38PM +0200, Ingo Molnar wrote: > > * Josh Poimboeuf wrote: > > > On Sat, Aug 15, 2015 at 12:23:54AM -0700, Andrew Morton wrote: > > > On Thu, 13 Aug 2015 22:10:24 -0500 Josh Poimboeuf wrote: > > > > > > > This adds a CONFIG_STACK_VALIDATION option which enables a host tool > > > > named stackvalidate which runs at compile time. > > > > > > It would be useful to > > > > > > - show example output for a typical error site > > > > > > - describe the consequences of that error (ie: why should we fix > > > these things?) > > > > > > - describe what needs to be done to repair these sites. > > > > > > > > > IOW, what do we gain from merging all this stuff? > > > > I attempted to do all that in Documentation/stack-validation.txt which > > is in patch 03/20. Does it address your concerns? Here it is: > > I think it answers the first and third question, but not the second one: > > - describe the consequences of that error (ie: why should we fix > these things?) > > would be nice to document all that richly. Ok. I've tried to answer the "why" question from both a broad perspective ("why do we need stackvalidate?") as well as for each of the rules that it enforces. ---8<--- Subject: [PATCH] stackvalidate: Document why it's needed Signed-off-by: Josh Poimboeuf --- Documentation/stack-validation.txt | 122 +++++++++++++++++++++++++++++++++++-- 1 file changed, 118 insertions(+), 4 deletions(-) diff --git a/Documentation/stack-validation.txt b/Documentation/stack-validation.txt index c3d3d35..94dad40 100644 --- a/Documentation/stack-validation.txt +++ b/Documentation/stack-validation.txt @@ -24,6 +24,101 @@ instructions). Similarly, it knows how to follow switch statements, for which gcc sometimes uses jump tables. +Why do we need stack validation? +-------------------------------- + +Here are some of the benefits of validating stack metadata: + +a) More reliable stack traces for frame pointer enabled kernels + + Frame pointers are used for debugging purposes. They allow runtime + code and debug tools to be able to walk the stack to determine the + chain of function call sites that led to the currently executing + code. + + For some architectures, frame pointers are enabled by + CONFIG_FRAME_POINTER. For some other architectures they may be + required by the ABI (sometimes referred to as "backchain pointers"). + + For C code, gcc automatically generates instructions for setting up + frame pointers when the -fno-omit-frame-pointer option is used. + + But for asm code, the frame setup instructions have to be written by + hand, which most people don't do. So the end result is that + CONFIG_FRAME_POINTER is honored for C code but not for most asm code. + + For stack traces based on frame pointers to be reliable, all + functions which call other functions must first create a stack frame + and update the frame pointer. If a first function doesn't properly + create a stack frame before calling a second function, the *caller* + of the first function will be skipped on the stack trace. + + The benefit of stackvalidate here is that it ensures that *all* + functions honor CONFIG_FRAME_POINTER. As a result, no functions will + ever [*] be skipped on a stack trace. + + [*] unless an interrupt or exception has occurred at the very + beginning of a function before the stack frame has been created, + or at the very end of the function after the stack frame has been + destroyed. This is an inherent limitation of frame pointers. + +b) 100% reliable stack traces for DWARF enabled kernels + + (NOTE: This is not yet implemented) + + As an alternative to frame pointers, DWARF Call Frame Information + (CFI) metadata can be used to walk the stack. Unlike frame pointers, + CFI metadata is out of band. So it doesn't affect runtime + performance and it can be reliable even when interrupts or exceptions + are involved. + + For C code, gcc automatically generates DWARF CFI metadata. But for + asm code, generating CFI is a tedious manual approach which requires + manually placed .cfi assembler macros to be scattered throughout the + code. It's clumsy and very easy to get wrong, and it makes the real + code harder to read. + + Stackvalidate will improve this situation in several ways. For code + which already has CFI annotations, it will validate them. For code + which doesn't have CFI annotations, it will generate them. So an + architecture can opt to strip out all the manual .cfi annotations + from their asm code and have stackvalidate generate them instead. + + We might also add a runtime stack validation debug option where we + periodically walk the stack from schedule() and/or an NMI to ensure + that the stack metadata is sane and that we reach the bottom of the + stack. + + So the benefit of stackvalidate here will be that external tooling + should always show perfect stack traces. And the same will be true + for kernel warning/oops traces if the architecture has a runtime + DWARF unwinder. + +c) Higher live patching compatibility rate + + (NOTE: This is not yet implemented) + + Currently with CONFIG_LIVEPATCH there's a basic live patching + framework which is safe for roughly 85-90% of "security" fixes. But + patches can't have complex features like function dependency or + prototype changes, or data structure changes. + + There's a strong need to support patches which have the more complex + features so that the patch compatibility rate for security fixes can + eventually approach something resembling 100%. To achieve that, a + "consistency model" is needed, which allows tasks to be safely + transitioned from an unpatched state to a patched state. + + One of the key requirements of the currently proposed livepatch + consistency model [*] is that it needs to walk the stack of each + sleeping task to determine if it can be transitioned to the patched + state. If stackvalidate can ensure that stack traces are reliable, + this consistency model can be used and the live patching + compatibility rate can be improved significantly. + + [*] https://lkml.kernel.org/r/cover.1423499826.git.jpoimboe@redhat.com + + Rules ----- @@ -35,15 +130,26 @@ To achieve the validation, stackvalidate enforces the following rules: outside of a function, it flags an error since that usually indicates callable code which should be annotated accordingly. + This rule is needed so that stackvalidate can properly identify each + callable function in order to analyze its stack metadata. + 2. Conversely, each section of code which is *not* callable should *not* be annotated as an ELF function. The ENDPROC macro shouldn't be used in this case. + This rule is needed so that stackvalidate can ignore non-callable + code. Such code doesn't have to follow any of the other rules. + 3. Each callable function which calls another function must have the correct frame pointer logic, if required by CONFIG_FRAME_POINTER or the architecture's back chain rules. This can by done in asm code with the FRAME_BEGIN/FRAME_END macros. + This rule ensures that frame pointer based stack traces will work as + designed. If function A doesn't create a stack frame before calling + function B, the _caller_ of function A will be skipped on the stack + trace. + 4. Dynamic jumps and jumps to undefined symbols are only allowed if: a) the jump is part of a switch statement; or @@ -51,10 +157,18 @@ To achieve the validation, stackvalidate enforces the following rules: b) the jump matches sibling call semantics and the frame pointer has the same value it had on function entry. + This rule is needed so that stackvalidate can reliably analyze all of + a function's code paths. If a function jumps to code in another + file, and it's not a sibling call, stackvalidate has no way to follow + the jump because it only analyzes a single file at a time. + 5. A callable function may not execute kernel entry/exit instructions. The only code which needs such instructions is kernel entry code, which shouldn't be be in callable functions anyway. + This rule is just a sanity check to ensure that callable functions + return normally. + Errors in .S files ------------------ @@ -63,8 +177,8 @@ If you're getting an error in a compiled .S file which you don't understand, first make sure that the affected code follows the above rules. -Here are some examples of common problems and suggestions for how to fix -them. +Here are some examples of common warnings reported by stackvalidate, +what they mean, and suggestions for how to fix them. 1. stackvalidate: asm_file.o: func()+0x128: call without frame pointer save/setup @@ -73,8 +187,8 @@ them. updating the frame pointer. If func() is indeed a callable function, add proper frame pointer - logic using the FP_SAVE and FP_RESTORE macros. Otherwise, remove its - ELF function annotation by changing ENDPROC to END. + logic using the FRAME_BEGIN and FRAME_END macros. Otherwise, remove + its ELF function annotation by changing ENDPROC to END. If you're getting this error in a .c file, see the "Errors in .c files" section. -- 2.4.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/