Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp3141491pxa; Tue, 25 Aug 2020 12:33:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzdnuGlcMxC4bqfo5DFaTtw70/JnATJSYFp77xAeBVjPtnjSNP4c1ckovRRhV/pe4WyIXpP X-Received: by 2002:aa7:df8c:: with SMTP id b12mr12102696edy.263.1598384034009; Tue, 25 Aug 2020 12:33:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598384034; cv=none; d=google.com; s=arc-20160816; b=HTg0NLEN7hmkVgAHcr+KbNMV3psUcaXezFX08HbugN88SgAMPvI0HDCFyhMhL6Ren8 VRd2Ki7ZYgwb6iM0oa6r8P6FBL7ZpKWLLJApCKMyzUrZK+QAC2w89B1D1df10iDlbxi8 Xq9qsYN8b7a2okdZPojxxICTYuZSbhn8tcYtRei47nl5QIW5e16z2C/GpQk3y0Ra6uZg YUJFBCoGwaJLT4qF2r05YzsKyTl1uy91uXT9VNQNqO+Cd9KuCLtC3cOxt9oYlU2t8nVc jSwcGMalzTuWuPh9iDFnUH2PIPjUQg1nIZBtmEuWzpkPu/PCb9Lm/Bcq1SOe09m0+Sjx zo8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=NPn3j52sAQTIJ78k+sNoL5unBY7IyofGbGsICaF63IA=; b=qCG9QXcy4AGDY7sYM44hkrHK6GI/WPYtkbW8hAzqHzjOcFkotglOysNLXqfMMrc16x a+j0Bd1f9id8mC0p7xoOOQN3Y3SxwD9hqT5npkiuPKt8Dt40Zt4YcaH94v4jsCkgGoJF bVSNpnUVM6CsG/COHlLp3orlELRHyJUgvAbepNYMHK8Uf936OB37mm4h3/GBKTWZiagH 7pwy0zir8nsmOjHjvHIaHU7nyj+JYprgSbdXlQg1efZetokNwF/Woy5Cq7Hn3bJisnfK 9ZsSP4Nu+D6YIJqchHxZxHH/wAHGe8h4w4eNJagwRsQZXknpYZSMdyMLPDD6ynN2dsMr kThA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=DLCQT1HM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g24si9996ejd.207.2020.08.25.12.33.29; Tue, 25 Aug 2020 12:33:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=DLCQT1HM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726391AbgHYTcU (ORCPT + 99 others); Tue, 25 Aug 2020 15:32:20 -0400 Received: from mail.kernel.org ([198.145.29.99]:35050 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726090AbgHYTcU (ORCPT ); Tue, 25 Aug 2020 15:32:20 -0400 Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CD9BB2076C for ; Tue, 25 Aug 2020 19:32:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598383939; bh=nmIX0CDjceyKvGjRzvTncu508gplaJItsNm8ytqKqdQ=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=DLCQT1HMKJ6i3NCy8d7cAa97A8jf7CHeF3ycGqv8RE3AtFKMpatNcyh0ObVC6Pw0Z /JZ8OM90UODrryZ+xdTJbUkSXkJGzyX/znuvx5dlW3LULUxBO+t4D2Ct3OuCR5P/Po r8DGk3BmCOXpzJV4/ir7tCB7glenfrp5Klx/sED8= Received: by mail-wr1-f51.google.com with SMTP id h15so7313724wrt.12 for ; Tue, 25 Aug 2020 12:32:18 -0700 (PDT) X-Gm-Message-State: AOAM532uU/oSpZXkTNhvVbStVDnEUpQeiYpsTdnfjsDogUU8oH2kG6pC WRvfWN8gA/p8SvZ/vb4g/Ur+1yJTtePIbZDRIdaYeQ== X-Received: by 2002:a5d:4145:: with SMTP id c5mr12277770wrq.18.1598383937286; Tue, 25 Aug 2020 12:32:17 -0700 (PDT) MIME-Version: 1.0 References: <7DF88F22-0310-40C9-9DA6-5EBCB4877933@amacapital.net> In-Reply-To: From: Andy Lutomirski Date: Tue, 25 Aug 2020 12:32:05 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system To: Kyle Huey Cc: Andy Lutomirski , "H. Peter Anvin" , "Robert O'Callahan" , "Bae, Chang Seok" , Thomas Gleixner , Ingo Molnar , Andi Kleen , "Shankar, Ravi V" , LKML , "Hansen, Dave" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 25, 2020 at 11:50 AM Kyle Huey wrote: > > On Tue, Aug 25, 2020 at 10:31 AM Kyle Huey wrote: > > > > On Tue, Aug 25, 2020 at 9:46 AM Andy Lutomirski wrote= : > > > > > > On Tue, Aug 25, 2020 at 9:32 AM Kyle Huey wrote: > > > > > > > > On Tue, Aug 25, 2020 at 9:12 AM Andy Lutomirski wrote: > > > > > I don=E2=80=99t like this at all. Your behavior really shouldn=E2= =80=99t depend on > > > > > whether the new instructions are available. Also, some day I wou= ld > > > > > like to change Linux to have the new behavior even if FSGSBASE > > > > > instructions are not available, and this will break rr again. (T= he > > > > > current !FSGSBASE behavior is an ugly optimization of dubious val= ue. > > > > > I would not go so far as to describe it as correct.) > > > > > > > > Ok. > > > > > > > > > I would suggest you do one of the following things: > > > > > > > > > > 1. Use int $0x80 directly to load 32-bit regs into a child. This > > > > > might dramatically simplify your code and should just do the righ= t > > > > > thing. > > > > > > > > I don't know what that means. > > > > > > This is untested, but what I mean is: > > > > > > static int ptrace32(int req, pid_t pid, int addr, int data) { > > > int ret; > > > /* new enough kernels won't clobber r8, etc. */ > > > asm volatile ("int $0x80" : "=3Da" (ret) : "a" (26 /* ptrace */), = "b" > > > (req), "c" (pid), "d" (addr), "S" (data) : "flags", "r8", "r9", "r10"= , > > > "r11"); > > > return ret; > > > } > > > > > > with a handful of caveats: > > > > > > - This won't compile with -fPIC, I think. Instead you'll need to > > > write a little bit of asm to set up and restore ebx yourself. gcc is > > > silly like this. > > > > > > - Note that addr is an int. You'll need to mmap(..., MAP_32BIT, ...= ) > > > to get a buffer that can be pointed to with an int. > > > > > > The advantage is that this should work on all kernels that support > > > 32-bit mode at all. > > > > > > > > > > > > 2. Something like your patch but make it unconditional. > > > > > > > > > > 3. Ask for, and receive, real kernel support for setting FS and G= S in > > > > > the way that 32-bit code expects. > > > > > > > > I think the easiest way forward for us would be a PTRACE_GET/SETREG= SET > > > > like operation that operates on the regsets according to the > > > > *tracee*'s bitness (rather than the tracer, as it works currently). > > > > Does that sound workable? > > > > > > > > > > Strictly speaking, on Linux, there is no unified concept of a task's > > > bitness, so "set all these registers according to the target's > > > bitness" is not well defined. We could easily give you a > > > PTRACE_SETREGS_X86_32, etc, though. > > > > In the process of responding to this I spent some time doing code > > inspection and discovered a subtlety in the ptrace API that I was > > previously unaware of. PTRACE_GET/SETREGS use the regset views > > corresponding to the tracer but PTRACE_GET/SETREGSET use the regset > > views corresponding to the tracee. This means it is possible for us > > today to set FS/GS "the old way" with a 64 bit tracer/32 bit tracee > > combo, as long as we use PTRACE_SETREGSET with NT_PRSTATUS instead of > > PTRACE_SETREGS. > > Alright I reverted the previous changes and switched us to use > PTRACE_SETREGSET with NT_PRSTATUS[0] and our 32 bit tests pass again. > I assume this behavior will remain unchanged indefinitely even when > the fs/gsbase manipulation instructions are not available since the 32 > bit user_regs_struct can't grow? Correct. I think it would be reasonable to add new, less erratic PTRACE_SETxyz modes, but the old ones will stay as is. Strictly speaking, the issue you had wasn't a ptrace change. It was a context switching change. Before the change, you were programming garbage into the tracee state, but the context switch code wasn't capable of restoring the garbage correctly and instead happened to restore the state you actually wanted. With the new code, the context switch actually worked correctly (for some value of correctly), and the tracee crashed. Not that this distinction really matters.