Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp14149imm; Thu, 28 Jun 2018 13:58:56 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdrzH/J5V9o7yRUZKTtQhKvvT7OWuNl7KAq0KFE43d47ojGnVS7vxsbCz7Dp49Qz0i0CAfR X-Received: by 2002:a62:4556:: with SMTP id s83-v6mr11671021pfa.73.1530219535937; Thu, 28 Jun 2018 13:58:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530219535; cv=none; d=google.com; s=arc-20160816; b=Gtnm6Kk4yeRCOrr+fhTIOx7JA3k1fS4vWRAoMjXPivMGUaaHqGDqCdTz9UjnWOQyQ9 voYgKaWrHQMvyfXlCJY9QPa9KylO3F8jgocDlEVR/pPX2dZqAhUGbT6e6JYdErZoEx6W WKfUKJK3xY8BwudImdu1Ugen0y0Cw0pZgpXzjX6AzvhUDXZw94sx03c3wV4rVJF48T8L dDeiHLz+r6w9lYsWXEek+fEncVZSqhvieGJ0yUR8SlRl8FpogpXEuwOkWpj/fLnA0fbU Ia/EDRfpUrR+WoWjTGLkQhxlg3y8Yc8e70LDbwdPql0Em/Nh3GJS4M9nb6FLPC7IP2q8 dg5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=zBgGrVCkiXSO9hCLsg6R7AYe/eBOubm0nHK1w61YxDM=; b=gfnp/p+uljMyL/cem5GTYIoWCvAwW5CLessh87PLM09sWlAVaIpaCwNN7bGdKxpElM 4LwSm2CgTw3QzOqH/Lcha5+BvTyPkru4HDBtB3/gvVYFzO/aRJjBPAw1vtvsbRDbzo+U W5XgNWYb527UugyMC2+pCHVqO4VGJCTa2kEu96J1vdnSx12JJiBWymAC0qj2/SDfLELG 7vfOIfRxoc8PZsvSiP3ycWszIqx13V7rmLjk6MeLjbNIJy+IejkmUBFDwKFouww8N03i KCwlnpmYKersB6AW9A+tt8tmb+4B8/ygn/eC6pbN8b34PTUd/QGbec3E6aGJPhGBZtxN LfSg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=z8KblJ9Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y3-v6si2446450pge.41.2018.06.28.13.58.41; Thu, 28 Jun 2018 13:58:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=z8KblJ9Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933615AbeF1Udt (ORCPT + 99 others); Thu, 28 Jun 2018 16:33:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:46786 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932078AbeF1Udr (ORCPT ); Thu, 28 Jun 2018 16:33:47 -0400 Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 947F3242DA for ; Thu, 28 Jun 2018 20:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1530218027; bh=AfnuWQQP97MYR1Usb9kpz7R64eXQPZM5FxMtTp2v04c=; h=In-Reply-To:References:From:Date:Subject:To:Cc:From; b=z8KblJ9Y5WV1EUVnctfnyVfZwdTc1Hrq6TVBncxgV6m4eIEnfv9PCojSPxAIrfPUP +fRP/cX4BL6uVfI3+UGpFGasJXb6bc2+DqC7M7lZeGT6U4MHQj2ckIIrph/4qdlwKw 83Ef9hwSl761rBw6XUoYBEEw/9F9iPlBRs9xibSY= Received: by mail-wm0-f54.google.com with SMTP id 69-v6so10288826wmf.3 for ; Thu, 28 Jun 2018 13:33:46 -0700 (PDT) X-Gm-Message-State: APt69E0zy9+fot4rtfLJULges9+vUDUsymlhw6AYPuM/kiMQ4k61ci8c Ipxytw4s08lVfb7VspRz7wzT/rlafNBH8jENxeqUiw== X-Received: by 2002:a1c:4a9d:: with SMTP id n29-v6mr8706936wmi.46.1530218025012; Thu, 28 Jun 2018 13:33:45 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:7e92:0:0:0:0:0 with HTTP; Thu, 28 Jun 2018 13:33:24 -0700 (PDT) In-Reply-To: <27F6CB18-8E20-487B-B55B-1DAEF9DF9E2C@zytor.com> References: <20180621211754.12757-1-h.peter.anvin@intel.com> <20180621211754.12757-2-h.peter.anvin@intel.com> <408ed97a-c64d-c523-c403-4e066d1f34c3@intel.com> <27F6CB18-8E20-487B-B55B-1DAEF9DF9E2C@zytor.com> From: Andy Lutomirski Date: Thu, 28 Jun 2018 13:33:24 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3 1/7] x86/ldt: refresh %fs and %gs in refresh_ldt_segments() To: "H. Peter Anvin" Cc: Andy Lutomirski , "H. Peter Anvin" , LKML , "H. Peter Anvin" , Ingo Molnar , Thomas Gleixner , "Bae, Chang Seok" , "Metzger, Markus T" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 27, 2018 at 11:22 AM, wrote: > On June 27, 2018 11:19:12 AM PDT, Andy Lutomirski wrote= : >>On Fri, Jun 22, 2018 at 11:47 AM, Andy Lutomirski >>wrote: >>> >>> >>> >>>> On Jun 22, 2018, at 11:29 AM, H. Peter Anvin >> wrote: >>>> >>>>> On 06/22/18 07:24, Andy Lutomirski wrote: >>>>> >>>>> That RPL3 part is false. The following program does: >>>>> >>>>> #include >>>>> >>>>> int main() >>>>> { >>>>> unsigned short sel; >>>>> asm volatile ("mov %%ss, %0" : "=3Drm" (sel)); >>>>> sel &=3D ~3; >>>>> printf("Will write 0x%hx to GS\n", sel); >>>>> asm volatile ("mov %0, %%gs" :: "rm" (sel & ~3)); >>>>> asm volatile ("mov %%gs, %0" : "=3Drm" (sel)); >>>>> printf("GS =3D 0x%hx\n", sel); >>>>> return 0; >>>>> } >>>>> >>>>> prints: >>>>> >>>>> Will write 0x28 to GS >>>>> GS =3D 0x28 >>>>> >>>>> The x86 architecture is *insane*. >>>>> >>>>> Other than that, this patch seems generally sensible. But my >>>>> objection that it's incorrect with FSGSBASE enabled for %fs and %gs >>>>> still applies. >>>>> >>>> >>>> Ugh, you're right... I misremembered. The CPL simply overrides the >>RPL >>>> rather than trapping. >>>> >>>> We still need to give legacy applications which have zero idea about >>the >>>> separate bases that apply only to 64-bit mode a way to DTRT. >>Requiring >>>> these old crufty applications to do something new is not an option. >>> >>>> >>>> As ugly as it is, I'm thinking the Right Thing is to simply make it >>a >>>> part of the Linux ABI that if the FS or GS selector registers point >>into >>>> the LDT then we will requalify them; if a 64-bit app does that then >>they >>>> get that behavior. This isn't something that will happen >>>> asynchronously, and if a 64-bit process loads an LDT value into FS >>or >>>> GS, they are considered to have opted in to that behavior. >>> >>> But the old and crusty apps don=E2=80=99t depend on requalification bec= ause >>we never used to do it. >>> >>> I=E2=80=99m not convinced we ever need to refresh the base. In fact, we= could >>start preserving the base of LDT-referencing FS/GS across context >>switches even without FSGSBASE at some minor performance cost, but I >>don=E2=80=99t really see the point. I still think my proposed semantics a= re >>easy to implement and preserve the ABI even if they have the sad >>property that the FSGSBASE behavior and the non-FSGSBASE behavior end >>up different. >>> >> >>There's another reasonable solution: do exactly what your patch does, >>minus the bugs. We would need to get the RPL !=3D 3 case right (easy) >>and the case where there's a non-running thread using the selector in >>question. The latter is probably best handled by adding a flag to >>thread_struct that says "fsbase needs reloading from the descriptor >>table" and only applies if the selector is in the LDT or TLS area. Or >>we could hijack a high bit in the selector. Then we'd need to update >>everything that uses the fields. > > Obviously fix the bugs. > > How would you control this bit? Sorry, I was wrong in my previous email. Let me try this again: Notwithstanding the RPL thing, the reason I don't like your patch as is, and the reason I didn't write a similar patch myself, is that it will behave nondeterministically on an FSGSBASE kernel. Suppose there are two threads, A and B, that share an mm. A has %fs =3D=3D 0x7 and FSBASE =3D 0. The LDT has the base for entry 0 set to 0. Now thread B calls modify_ldt to change the base for entry 0 to 1. The Obviously Sane (tm) behavior is for task A's FSBASE to asynchronously change to 1. This is the only deterministic behavior that is even possible on a 32-bit kernel, and it's the only not-totally-nutty behavior that is possible on a 64-bit non-FSGSBASE kernel, and it's still perfectly reasonable for FSGSBASE. The problem is that it's not so easly to implement. With your patch, on an FSGSBASE kernel, we get the desired behavior if thread A is running while thread B calls modify_ldt(). But we get different behavior if thread A is stopped -- thread A's FSBASE will remain set to 0. With that in mind, my email was otherwise garbage, and the magic "bit" idea was total crap. I can see three vaguely credible ways to implement this. 1. thread B walks all threads on the system, notices that thread A has the same mm, and asynchronously fixes it up. The locking is a bit tricky, and the performance isn't exactly great. Maybe that's okay. 2. We finally add an efficient way to find all threads that share an mm and do (1) but faster. 3. We add enough bookkeeping to thread_struct so that, the next time thread A runs or has ptrace try to read its FSBASE, we notice that FSBASE is stale and fix it up. (3) will perform the best, but the implementation is probably nasty. If we want modify_ldt() to only reset the base for the modified records, we probably need a version number for each of the 8192 possible LDT entries stored in ldt_struct (which will double its size, but so what?). Then we need thread_struct to store the version number of the LDT entries that fsindex and gsindex refer to. Now we make sure that every code path that reads fsbase or gsbase first calls some revalidate_fs_and_gs() function that will reset the bases and maybe even the selectors if needed. Getting the locking right on that last bit is possibly a bit tricky, since we may need the LDT lock to be held across the revalidation *and* whatever subsequent code actually reads the values. I think that (3) is the nicest solution, but it would need to be implemeted= . What do you think?