Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2209428imm; Thu, 18 Oct 2018 10:43:12 -0700 (PDT) X-Google-Smtp-Source: ACcGV61UOMT0d5KYR4Zz9sZUcCoszPBO+YzxrbgoHTo1Xfkr8vFHGhCGyYrryGDMc0wEp8w51p5y X-Received: by 2002:a17:902:b40d:: with SMTP id x13-v6mr31235790plr.13.1539884592787; Thu, 18 Oct 2018 10:43:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539884592; cv=none; d=google.com; s=arc-20160816; b=Zz7LsPO6IKZ4J64rMQY695rlv4mcZL3UkK/vu98QHNKno0il6+7MclzyGPlHdkyGwc MUzjUrGGkfW0padlj3/49tt1jymFfEOKDxkSeSs+fCnkb020BJcgOofuOz6HBsg3+L4i LRLT+up/hahyCNDoyeHuoby22+RvQYjKh0ulHYL5uu0LgaBaf44HODgdZ1fjUe3xPZrl BveXOlTxpPKSDQkvDx0OM/kNWZF3ScfnHztsnufaItOiKNKB3T9OYYEYZo/go5own+so hUg2eH4WYePWa0mL+WRPgABacwxqmU+hyuCrvyUEqCVb5nCw8hIPSeV+ToaHiGZllHNs adMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=pQI5IFcZr6aMajzeJgqYkjwsGAqGYk3eye6qm5t2GeY=; b=XrK9QGn6cZ0y8AzPAHpt3IcAsO0cpaXRFR1BaIJmj8sNcvju2mrd1ty/fC5mD48Mkh Tppwspgsdam1G3ep275mbKl42zDZ1XuQEUKSwngzu2NecyQtdjtfGmBy61k8AKS2R2a1 pffOOG2sou9j5xrhsfIjejl8pgqDBHjUEafszgVLXdFBOBMT6azBuRsUaSKCmaVqcBgl Z7HV3ossl/UQPCyUGRmt7f5fdkDLFrCqvYvAFxWAvYcMx6UC1BGRNUIhkURJv9DXqY5H 1oYBHyX7wduMKCX10DfP1SGoU/UZiDITjquGnrCsbFqXSzFgQWS0ZURFiAGIbp+Q1NWP xzdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=O69wJKPG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 34-v6si21621406plz.227.2018.10.18.10.42.55; Thu, 18 Oct 2018 10:43:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=O69wJKPG; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728086AbeJSBbZ (ORCPT + 99 others); Thu, 18 Oct 2018 21:31:25 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:33337 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727387AbeJSBbZ (ORCPT ); Thu, 18 Oct 2018 21:31:25 -0400 Received: by mail-wm1-f65.google.com with SMTP id y140-v6so2335903wmd.0 for ; Thu, 18 Oct 2018 10:29:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=pQI5IFcZr6aMajzeJgqYkjwsGAqGYk3eye6qm5t2GeY=; b=O69wJKPGnSg07SB5Bgv9x1lqQ9tD1LZzVKHw4zDlWLjFRHQpHaiLhXgKo/xsjUVYVp eJtMOoX4RjqR960l1CveJCKBoTs19Vq7sT0bPqFk/DMcwnh+nKXdsFxvYlQnzjIoG78R SBWKwxoRK+37tH4M1+9f3kqy7TmeDDoqt6YpK2pWbeaS9k9rNP93NNjlLQpJ2hXJxKTk w4P41dM3rL4I6JfjJJxbmd8DnHzucw3gIHvpto1BCXFgbKM6yBbggC7R7ww6Ir5GEQGO a6T/MnsV+RhWQOk6vP701jEGH3wgHxuT7NLqmRxKzLdxRrjfxaiLQqgME15OT7NMt8eH BAww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=pQI5IFcZr6aMajzeJgqYkjwsGAqGYk3eye6qm5t2GeY=; b=IW5OA3Fy2rYTJPAsMcOiNt8+1o2KJPq3qKq/DvvOJX1yohsXpdtCT6xjXh2vSLzzm6 Zi1W1yK4zO446M9A50ioxdJhLj2dv012vm3s87/ZBSj8delT3+ZeM12YT/I8zUF+MEwg SfWssoCt7tQxmk6ns3l5g4xtg979TsH8I96S4tPEHg1UzBBSpH+/WHea0KmXSNjHEXSr CDLg1Ko/3cM1F5v6N5gBMSJlJ+sT/ZY2gahy8ChHlXaD08LRMSygdh9ovO8qCRRJdxbO z+rjohZ1i+jWfO5211gWWCjn47jq+SrNB6TR+VNM4KTDyf2vz4BKvRTuAVT/06cUJyGa rmzw== X-Gm-Message-State: ABuFfogEuAPGdMUN7DS0XIAbG6bYxV/R7hgIl7GVQLpb3kJfTth4vzmk P6RBqI9OLXDcO1E6pguWNQ4BOgEVjxQLD6poGWiopg== X-Received: by 2002:a1c:f312:: with SMTP id q18-v6mr1326070wmq.14.1539883765074; Thu, 18 Oct 2018 10:29:25 -0700 (PDT) MIME-Version: 1.0 References: <20181018005420.82993-1-namit@vmware.com> <20181018005420.82993-2-namit@vmware.com> <07255D2B-0243-4254-B62A-37050C44207E@vmware.com> <925F22EA-F8CB-4194-B96B-378409ED7918@vmware.com> <2626124E-7344-42F3-AD07-0BB34D62A9EE@amacapital.net> <2054C1A9-37C1-4A5A-A716-EDAC90564D2A@vmware.com> In-Reply-To: <2054C1A9-37C1-4A5A-A716-EDAC90564D2A@vmware.com> From: Andy Lutomirski Date: Thu, 18 Oct 2018 10:29:13 -0700 Message-ID: Subject: Re: [RFC PATCH 1/5] x86: introduce preemption disable prefix To: Nadav Amit Cc: Ingo Molnar , Andrew Lutomirski , Peter Zijlstra , "H. Peter Anvin" , Thomas Gleixner , LKML , X86 ML , Borislav Petkov , "Woodhouse, David" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 18, 2018 at 10:25 AM Nadav Amit wrote: > > at 10:00 AM, Andy Lutomirski wrote: > > > > > > >> On Oct 18, 2018, at 9:47 AM, Nadav Amit wrote: > >> > >> at 8:51 PM, Andy Lutomirski wrote: > >> > >>>> On Wed, Oct 17, 2018 at 8:12 PM Nadav Amit wrote: > >>>> at 6:22 PM, Andy Lutomirski wrote: > >>>> > >>>>>> On Oct 17, 2018, at 5:54 PM, Nadav Amit wrote: > >>>>>> > >>>>>> It is sometimes beneficial to prevent preemption for very few > >>>>>> instructions, or prevent preemption for some instructions that pre= cede > >>>>>> a branch (this latter case will be introduced in the next patches)= . > >>>>>> > >>>>>> To provide such functionality on x86-64, we use an empty REX-prefi= x > >>>>>> (opcode 0x40) as an indication that preemption is disabled for the > >>>>>> following instruction. > >>>>> > >>>>> Nifty! > >>>>> > >>>>> That being said, I think you have a few bugs. First, you can=E2=80= =99t just ignore > >>>>> a rescheduling interrupt, as you introduce unbounded latency when t= his > >>>>> happens =E2=80=94 you=E2=80=99re effectively emulating preempt_enab= le_no_resched(), which > >>>>> is not a drop-in replacement for preempt_enable(). To fix this, you= may > >>>>> need to jump to a slow-path trampoline that calls schedule() at the= end or > >>>>> consider rewinding one instruction instead. Or use TF, which is onl= y a > >>>>> little bit terrifying=E2=80=A6 > >>>> > >>>> Yes, I didn=E2=80=99t pay enough attention here. For my use-case, I = think that the > >>>> easiest solution would be to make synchronize_sched() ignore preempt= ions > >>>> that happen while the prefix is detected. It would slightly change t= he > >>>> meaning of the prefix. > >> > >> So thinking about it further, rewinding the instruction seems the easi= est > >> and most robust solution. I=E2=80=99ll do it. > >> > >>>>> You also aren=E2=80=99t accounting for the case where you get an ex= ception that > >>>>> is, in turn, preempted. > >>>> > >>>> Hmm.. Can you give me an example for such an exception in my use-cas= e? I > >>>> cannot think of an exception that might be preempted (assuming #BP, = #MC > >>>> cannot be preempted). > >>> > >>> Look for cond_local_irq_enable(). > >> > >> I looked at it. Yet, I still don=E2=80=99t see how exceptions might ha= ppen in my > >> use-case, but having said that - this can be fixed too. > > > > I=E2=80=99m not totally certain there=E2=80=99s a case that matters. B= ut it=E2=80=99s worth checking > > > >> To be frank, I paid relatively little attention to this subject. Any > >> feedback about the other parts and especially on the high-level approa= ch? Is > >> modifying the retpolines in the proposed manner (assembly macros) > >> acceptable? > > > > It=E2=80=99s certainly a neat idea, and it could be a real speedup. > > Great. So I=E2=80=99ll try to shape things up, and I still wait for other= comments > (from others). > > I=E2=80=99ll just mention two more patches I need to cleanup (I know I st= ill owe you some > work, so obviously it will be done later): > > 1. Seccomp trampolines. On my Ubuntu, when I run Redis, systemd installs = 17 > BPF filters on the Redis server process that are invoked on each > system-call. Invoking each one requires an indirect branch. The patch kee= ps > a per-process kernel code-page that holds trampolines for these functions= . I wonder how many levels of branches are needed before the branches involved exceed the retpoline cost. > > 2. Binary-search for system-calls. Use the per-process kernel code-page a= lso > to hold multiple trampolines for the 16 common system calls of a certain > process. The patch uses an indirection table and a binary-search to find = the > proper trampoline. Same comment applies here. > > Thanks again, > Nadav --=20 Andy Lutomirski AMA Capital Management, LLC