Date: Mon, 21 Nov 2016 08:13:42 +0100
From: Ingo Molnar <mingo@kernel.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>, Andy Lutomirski <luto@kernel.org>,
        tedheadster@gmail.com, Linus Torvalds <torvalds@linux-foundation.org>,
        "H. Peter Anvin" <hpa@zytor.com>, George Spelvin <linux@horizon.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        X86 ML <x86@kernel.org>
Subject: Re: What exactly do 32-bit x86 exceptions push on the stack in the
 CS slot?
Message-ID: <20161121071342.GA16999@gmail.com>
References: <CALCETrUqdp=rEKX4gdSpJYder3q0g_yxdRE9APw8MgerXvnB=w@mail.gmail.com>
 <CAMzpN2h_1m3wcSpvNxC4FyOrDBnn50Estwk_v_zc7=NNGxW_zg@mail.gmail.com>
 <CALCETrU7voFkTpKmyo2ujEAuEUVOg3r-FKspnCWXQ-pXpQamDg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrU7voFkTpKmyo2ujEAuEUVOg3r-FKspnCWXQ-pXpQamDg@mail.gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2669
Lines: 62


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst <brgerst@gmail.com> wrote:
> > On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski <luto@kernel.org> wrote:
> >> This is a question for the old-timers here, since I can't find
> >> anything resembling an answer in the SDM.
> >>
> >> Suppose an exception happens (#UD in this case, but I assume it
> >> doesn't really matter).  We're not in long mode, and the IDT is set up
> >> to deliver to a normal 32-bit kernel code segment.  We're running in
> >> that very same code segment when the exception hits, so no CPL change
> >> occurs and the TSS doesn't particularly matter.
> >>
> >> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> >> happens to the high word of CS on the stack?
> >>
> >> The SDM appears to say nothing at all about this.  Modern systems
> >> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> >> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> >> high bits (or maybe just leave whatever was already on the stack in
> >> place).
> >>
> >> Do any of you happen to know what's going on and when the behavior
> >> changed?  I'd like to know just how big of a problem this is.  Because
> >> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> >> them.
> >>
> >> --Andy
> >
> > This came up a while back, and we was determined that we can't assume
> > zero-extension in 32-bit mode because older processors only do a
> > 16-bit write even on a 32-bit push.  So all segments have to be
> > treated as 16-bit values, or we have to explicitly zero-extend them.
> >
> > All 64-bit capable processors do zero-extend segments, even in 32-bit mode.
> 
> This almost makes me want to change the definition of pt_regs on
> 32-bit rather than fixing all the entry code.

So I have applied your fix that addresses the worst fallout directly:

  fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in early_fixup_exception()

... but otherwise we might be better off zeroing out the high bits of segment 
registers stored on the stack, in all entry code pathways - maybe using a single 
function and conditional on <PPro - so that the function call is patched out on 
modern CPUs.

This would remove the 0xffff hacks from CPUs that zero-extend, from their 
exception handling hot path.

Assuming we care about pre PPro CPUs. I suspect we do in an abstract sense - P5 
and earlier CPU designs might continue popping up in embedded designs, due to 
their (relative) simplicity.

In practice it almost certainly does not matter at all, as shown by the longevity 
of this regression.

Thanks,

	Ingo