Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp872537ybl; Wed, 4 Dec 2019 12:22:47 -0800 (PST) X-Google-Smtp-Source: APXvYqzm/2xLeWfImN7Jo2yaEkNj9ulZlJTzwRKqBw5PiTxvrY9bWp2eVHcgaycYaakFo+S6C/JI X-Received: by 2002:aca:cdd5:: with SMTP id d204mr4279332oig.134.1575490966855; Wed, 04 Dec 2019 12:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1575490966; cv=none; d=google.com; s=arc-20160816; b=qQR1Wg3iFdp5w4G3oYydJswrLF4uWFpzK1+HZq+0XI0Va1ITs56wAaY/S302A0PHTh Rj5iWkANrEsmPpUxCJcfHsgme8HyiZeiOdBHbhptMKqwbbCyzSx5jau8zUjXF2V5JLUh tmRFJ/OvwXXChcIuP865Q23N69Cnd1TXtLUQaGxBJ+OJrhwo+zZhi/9as4MLJf/ATIHJ xBQ5UN5BVRJiwSXnScGnPOkZhc89O+DuuUM+mziE8bksynxgUHBfITcEHp2Ve2NEnr8C qZwqbxTp252cafhv8Z8JUuOPvKGZgAodaVirIvD4QJ8hrOQaG3evvs5N3oDkwfQDishe paqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=F++yaIR+2fkvaG0ya07/Gn7rddcDd1FvsMfL1a2QB7c=; b=bq+UyKEYQ/jTuaDgzHttYjgMayoSroZmZzKPk13G71Wdevqc7yzSwTMS2/w3wjf4aq ns2szZFPB9IgNaiV/p6kdYPGqV9GMNcxFWqIywLaRw3NF8YC9ijVBdh0NSF4tTEFD1eY xcNnkloqHcWz7a216lPLT1V3z57AKU7KUpkyw1DZx30e733Gtz5EyYoOd3semRWx0C7/ Azy08h5gOLBvWY778wz7ukO4mNEF4+XzpYqB2h+Bl6QsiuG7/uiS8wnVH3TXOIRkeCw/ FvfhtdYe+y7CEil8G2uLgJpce0ltp8lfAzjLzp5AyY2RrWfxLwEDnMMtiILltoIKvU7f BwHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=syXInMnB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b18si3526482otl.202.2019.12.04.12.22.33; Wed, 04 Dec 2019 12:22:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=syXInMnB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728031AbfLDUVB (ORCPT + 99 others); Wed, 4 Dec 2019 15:21:01 -0500 Received: from mail.kernel.org ([198.145.29.99]:40252 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727887AbfLDUVA (ORCPT ); Wed, 4 Dec 2019 15:21:00 -0500 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 153BE21826 for ; Wed, 4 Dec 2019 20:20:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575490859; bh=lO0lBJgLyqcoBTyVD8Q+rC9hNWl674FfWN001V3gRtU=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=syXInMnBdcHfggC2YZeZqDGzqn7ueusXNnCZSx+2ObCK9uquHBJk4i/VSJLYOgdbO CM+zsK5chs3dFq6YujvKCbZkfD3LIGkOXt9MlkweLEdTSvVL66cMs5LYA2lfHO/RMC M6I1sTY9A0N8USjX79MsWlwQw6t/7zolppmive5c= Received: by mail-wm1-f45.google.com with SMTP id t14so1162293wmi.5 for ; Wed, 04 Dec 2019 12:20:59 -0800 (PST) X-Gm-Message-State: APjAAAWjdU06m8zx20rJ70AP9XrjhNKWfixGbQLVtre5QShEaq+TUWz8 z0sP93wrmER2GnUzJwHqlCu+vX7VSvqP+tyqfypONw== X-Received: by 2002:a1c:9c54:: with SMTP id f81mr1523186wme.89.1575490857350; Wed, 04 Dec 2019 12:20:57 -0800 (PST) MIME-Version: 1.0 References: <1570212969-21888-1-git-send-email-chang.seok.bae@intel.com> <20191115191200.GD22747@tassilo.jf.intel.com> In-Reply-To: From: Andy Lutomirski Date: Wed, 4 Dec 2019 12:20:46 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v9 00/17] Enable FSGSBASE instructions To: "Metzger, Markus T" Cc: Andy Lutomirski , Thomas Gleixner , "Bae, Chang Seok" , "linux-kernel@vger.kernel.org" , "bp@alien8.de" , "hpa@zytor.com" , "Hansen, Dave" , "Luck, Tony" , "Shankar, Ravi V" , Pedro Alves , Simon Marchi , Andi Kleen Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 2, 2019 at 12:23 AM Metzger, Markus T wrote: > > > On Fri, Nov 29, 2019 at 6:56 AM Metzger, Markus T > > wrote: > > > > > > > On Fri, Nov 15, 2019 at 07:29:17PM +0100, Thomas Gleixner wrote: > > > > > On Fri, 4 Oct 2019, Chang S. Bae wrote: > > > > > > > > > > > > Updates from v8 [10]: > > > > > > * Internalized the interrupt check in the helper functions (Andy L.) > > > > > > * Simplified GS base helper functions (Tony L.) > > > > > > * Changed the patch order to put the paranoid path changes before the > > > > > > context switch changes (Tony L.) > > > > > > * Fixed typos (Randy D.) and massaged a few sentences in the > > documentation > > > > > > * Massaged the FSGSBASE enablement message > > > > > > > > > > That still lacks what Andy requested quite some time ago in the V8 thread: > > > > > > > > > > https://lore.kernel.org/lkml/034aaf3a-a93d-ec03-0bbd- > > > > 068e1905b774@kernel.org/ > > > > > > > > > > "I also think that, before this series can have my ack, it needs an > > > > > actual gdb maintainer to chime in, publicly, and state that they have > > > > > thought about and tested the ABI changes and that gdb still works on > > > > > patched kernels with and without FSGSBASE enabled. I realize that there > > > > > were all kinds of discussions, but they were all quite theoretical, and > > > > > I think that the actual patches need to be considered by people who > > > > > understand the concerns. Specific test cases would be nice, too." > > > > > > > > > > What's the state of this? > > > > > > On branch users/mmetzger/fsgs in sourceware.org/git/binutils-gdb.git, > > > there's a GDB test covering the behavior discussed theoretically back then. > > > > > > It covers modifying the selector as well as the base from GDB and using > > > the modified values for inferior calls as well as for resuming the inferior. > > > > > > Current kernels allow changing the selector and provide the resulting > > > base back to the ptracer. They also allow changing the base as long as > > > the selector is zero. That's the behavior we wanted to preserve IIRC. > > > > The general kernel rule is that we don't break working applications. > > Other than that, we're allowed to change the ABI if existing working > > applications don't break. I can't tell whether you wrote a test that > > detects a behavior change or whether you wrote a test that tests > > behavior that gdb or other programs actually rely on. > > Well, that's a tough question. The test covers GDB's behavior on today's > systems. GDB itself does not actually rely on that behavior. That is, GDB > itself wouldn't break. You couldn't do all that you could do with it before, > though. GDB does rely on at least some behavior. If I tell gdb to call a function on my behalf, doesn't it save the old state, call the function, and then restore the state? Surely it expects the restore operation to actually restore the state. > > It would be GDB's users that are affected. How do you tell if anyone is > actually relying on it? No clue. But at least if this type of use is mostly interactive, then users should be that badly affected. It also helps that very, very few 64-bit applications use nonzero segments at all. They used to because of a kernel optimization to automatically load a segment if an FS or GSBASE less than 4GB was requested, but that's been gone for a while. Calling set_thread_area() at all in a 64-bit program requires considerable gymnastics, and distributions can and do disable modify_ldt() outright without significant ill effects. So we're mostly talking about compatibility with 32-bit programs and exotic users like Wine and DOSEMU. > > > > Certainly, with a 32-bit *gdb*, writing a nonzero value to FS or GS > > using ptrace should change the base accordingly. I think the current > > patches get this wrong. > > > > With a 64-bit gdb and a 32-bit inferior, in an ideal world, everything > > would work just like full 64-bit, since that's how the hardware works. > > Not sure what you mean. The h/w runs in compatibility mode and the > inferior cannot set the base directly, can it? I think there's a general impedance mismatch between gdb and the kernel/hw here. On Linux on a 64-bit machine, there's isn't really a strong concept of a "32-bit process" versus a "64-bit process". All tasks have 64-bit values in RAX, all tasks have R8-R15, all tasks have a GDT and an LDT, etc. "32-bit tasks" are merely tasks that happen to be running with a compatibility selector loaded into CS at the time. Tasks can and do switch freely between compatibility and long mode using LJMP or LRET. As far as I can tell, however, gdb doesn't really understand this and thinks that 32-bit tasks are their own special thing. This causes me real problems: gdb explodes horribly if I connect gdb to QEMU's gdbserver (qemu -s) and try to debug during boot when the inferior switches between 32-bit and long mode. As far as FSGSBASE goes, a "32-bit task" absolutely can set independent values in FS and FSBASE, although it's awkward to do so: the task would have to do a far transfer to long mode, then WRFSBASE, then far transfer back to compat mode. But this entire sequence of events could occur without entering the kernel at all, and the ptrace API should be able to represent the result. I think that, ideally, a 64-bit debugger would understand the essential 64-bitness of even compat tasks and work sensibly. I don't really expect gdb to be able to do this any time soon, though. > > > > But we don't necessary live in an ideal world. > > > > With a 64-bit gdb and a 64-bit inferior, the inferior can set FS to > > some nonzero value and then set FSBASE to an arbitrary 64-bit number, > > and FS will retain its value. ptrace needs to give gdb some way to > > read, save, and restore this state. > > With Chang's patch series, that actually works. You can set FS and then > set FSBASE without setting FS to zero previously. The tests do not cover > that since on current system that leads to the inferior crashing in read_fs(). > > > > I think the ideal behavior is that 64-bit ptrace callers should > > control FS and FSBASE independently. The question is: will that break > > things? If it will, then we'll need to make sure that there is an API > > by which a debugger can independently control FS and FSBASE, and we'll > > also need to make sure that whatever existing API debuggers use to > > change FS and expect FSBASE to magically change as well continue to > > have that effect. > > We had discussed this some time ago and proposed the following behavior: " > https://lore.kernel.org/lkml/1521481767-22113-15-git-send-email-chang.seok.bae@intel.com/ > > In a summary, ptracer's update on FS/GS selector and base > yields such results on tracee's base: > - When FS/GS selector only changed (to nonzero), fetch base > from GDT/LDT (legacy behavior) > - When FS/GS base (regardless of selector) changed, tracee > will have the base > " Indeed. But I never understood how this behavior could be implemented with the current ABI. As I understand it, gdb only ever sets the inferior register state by using a single ptrace() call to load the entire state, which means that the kernel does not know whether just FS is being written or whether FS and FSBASE are being written. What actual ptrace() call does gdb use when a 64-bit gdb debugs a 64-bit inferior? How about a 32-bit inferior? > > The ptracer would need to read registers back after changing the selector > to get the updated base. What would the actual API be? I think it could make sense to add a whole new ptrace() command to tell the tracee to, in effect, MOV a specified value to a segment register. This call would have the actual correct semantics in which it would return an error code if the specified value is invalid and would return 0 on success. And then a second ptrace() call could be issued to read out FSBASE or GSBASE if needed. Would this be useful? What gdb commands would invoke it? > > The only time when both change at the same time, then, is when registers > are restored after returning from an inferior call. And then, it's the base > we want to take priority since we previously ensured that the base is always > up-to-date. Right. But how does the kernel tell the difference? > > > > The base of a segment in a descriptor table is 32 bits, full stop. > > This is a hardware limitation and has nothing to do with the kernel. > > base_addr is correctly unsigned int in the kernel headers. If you > > actually find a system where base_addr is unsigned long and unsigned > > long is 64 bits, then your test will malfunction. > > The modify_ldt(2) man page says: " > The user_desc structure is defined in as: > > struct user_desc { > unsigned int entry_number; > unsigned long base_addr; > unsigned int limit; > unsigned int seg_32bit:1; > unsigned int contents:2; > unsigned int read_exec_only:1; > unsigned int limit_in_pages:1; > unsigned int seg_not_present:1; > unsigned int useable:1; > }; > " > > The declaration in asm/ldt.h actually defines base_addr as unsigned int. > > So my comment about 'some 64-bit systems' is wrong and should actually > say 'all systems'. Will fix. > > That by itself is not an issue as long as the main executable is not loaded at > a high address. I only ran into problems with that on some ubuntu system > in our test pool. In my test cases, I use mmap() with MAP_32BIT to avoid this issue.