Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp33497imu; Thu, 10 Jan 2019 16:28:49 -0800 (PST) X-Google-Smtp-Source: ALg8bN4oHccUPaM/Y7XvEylSjBsQWYtdEOP0/Uu2ZQGvdhsaGIi1/osUIZsAc1m09Xky+MvMtLYb X-Received: by 2002:a17:902:8d95:: with SMTP id v21mr12473289plo.162.1547166529862; Thu, 10 Jan 2019 16:28:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547166529; cv=none; d=google.com; s=arc-20160816; b=nRbJB1pllZjp6aHkE3UbLL10SbQOX4t1bYG/bAJU8+D0nbdTR1aEEJMBP35OtN8fGm KO94jL+TNpGEUpGnW6jUZk1OHElVd+IyHfmSpB8XjfFv87TNMMxA0nZmo08jiXtppxG9 nUE9Z9lWbTqY3FIHEOJprCdhdmSEnCjXZgoRRnP22h8tS7BbDRd7BcQjK+1xr8qKQGnS pECn+hqaTrVBtgE0rKpE80K4b2uMhwrccZ/HiGycSKDgZqSOKm/kdiXjIkBBWLhzTab1 rUcbbgWmGuGQZU3aVeBixolP1QYJgNvcJn3YB5S+rbYPUyJxAUQ1sMj5ZJE3WmFtO6Ao F0KA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=pAeQrrPFogb4V7tIaqn4XMzaKAPTBWnl2HVENewfKEs=; b=mLPnamjrI5EV4fnnD6Pd6pS8iTGYwVMGgFx+kRKnfPL/xsw5UM1/mALmbhuAC3Gk4R rMQaGNkD1zzedj4i9z0UtxflLWbWeKTTdTeBxRnPoLYWfxEBs//2RX9LjNcaxktvzbAL cNUm1ujV4EEScP/jlkZNlOqCtNR9VM03jlpBcEttvac0Sc9oQ67QKAkqHqIcAcRl+PfF io/7hLdwKSf2LcHPcOBl+Kb7VV7tHdJH5D8YrIEPRaEtXNKT/ck6iMfWhBjC/1wH4FhM 4QpkCoCrS5vB8WgsEDEJsV/8Rx8WGLaxInuPSq71+GHxqoxy6kPHa+aFmsxSILLhu9b2 6sJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=NTBxXjW1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k15si70492303pgi.99.2019.01.10.16.28.34; Thu, 10 Jan 2019 16:28:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b=NTBxXjW1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728954AbfAJXH5 (ORCPT + 99 others); Thu, 10 Jan 2019 18:07:57 -0500 Received: from mail-vs1-f67.google.com ([209.85.217.67]:44959 "EHLO mail-vs1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726255AbfAJXH5 (ORCPT ); Thu, 10 Jan 2019 18:07:57 -0500 Received: by mail-vs1-f67.google.com with SMTP id z23so8088829vsj.11 for ; Thu, 10 Jan 2019 15:07:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pAeQrrPFogb4V7tIaqn4XMzaKAPTBWnl2HVENewfKEs=; b=NTBxXjW1RJPgt4TrOmul7wCHLaypyr7ulxqdiFmQaTUJ5hUVrp98CVTeGdLkx6Mg6M lc286CNT8YqdEEVNztxsP4gTzdUOR4w0by2jt3MUSe7Z8SIRS2AIW5NAiLpN+/Iz7oAm tdzFq9f+69udFotzVtHFN1iG1hUJDBuKF1Lrs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pAeQrrPFogb4V7tIaqn4XMzaKAPTBWnl2HVENewfKEs=; b=VSOfXgsAAUJPTRNf3rpLuZlqR8Nq6PVdlrERA+FiI63aBipCASaVuPcExokOcSQPN5 EYE07ZK8m4M3wkTidDPFuYHeayV85f0E6ivipsL9rwbsGDhLh9qulPFskUOVC6ycbAR6 2R/y/iU+3RWnuV0KeydMKJvoWL2ALxgrlSC+xzdGEnDEA7OrnF3v8jDNM+p3Ud0gqYnQ ULXWuZHB8qj3rO5z9Ba7T0TSGaS9ETdgd7TbQ9SM63W5Q2nP28px4b5DK8hmvUPx50oN oBaaBSAe4qb3M8MCoz6Bvfc3V4bdqRYbEqh98sXUIxGuowABig5e7cBRmLzgIstxDP3J AMiQ== X-Gm-Message-State: AJcUukcsCuGPWp2pwncbLDCC1rxZjbUAYtalhi927feC7jKxm1PKYNZa wBjoCpTBK1bagK1kpvZGiZ4bZpMowm4= X-Received: by 2002:a67:e89a:: with SMTP id x26mr5032714vsn.80.1547161674858; Thu, 10 Jan 2019 15:07:54 -0800 (PST) Received: from mail-vs1-f53.google.com (mail-vs1-f53.google.com. [209.85.217.53]) by smtp.gmail.com with ESMTPSA id l13sm62170951vka.16.2019.01.10.15.07.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Jan 2019 15:07:54 -0800 (PST) Received: by mail-vs1-f53.google.com with SMTP id n13so8115484vsk.4 for ; Thu, 10 Jan 2019 15:07:52 -0800 (PST) X-Received: by 2002:a67:e15e:: with SMTP id o30mr5244437vsl.66.1547161670283; Thu, 10 Jan 2019 15:07:50 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Kees Cook Date: Thu, 10 Jan 2019 15:07:38 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC PATCH v7 00/16] Add support for eXclusive Page Frame Ownership To: Khalid Aziz Cc: Andy Lutomirski , Dave Hansen , Ingo Molnar , Juerg Haefliger , Tycho Andersen , jsteckli@amazon.de, Andi Kleen , Linus Torvalds , liran.alon@oracle.com, Konrad Rzeszutek Wilk , deepa.srinivasan@oracle.com, chris hyser , Tyler Hicks , "Woodhouse, David" , Andrew Cooper , Jon Masters , Boris Ostrovsky , kanth.ghatraju@oracle.com, joao.m.martins@oracle.com, Jim Mattson , pradeep.vincent@oracle.com, John Haxby , "Kirill A. Shutemov" , Christoph Hellwig , steven.sistare@oracle.com, Kernel Hardening , Linux-MM , LKML , Thomas Gleixner Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 10, 2019 at 1:10 PM Khalid Aziz wrote: > I implemented a solution to reduce performance penalty and > that has had large impact. When XPFO code flushes stale TLB entries, > it does so for all CPUs on the system which may include CPUs that > may not have any matching TLB entries or may never be scheduled to > run the userspace task causing TLB flush. Problem is made worse by > the fact that if number of entries being flushed exceeds > tlb_single_page_flush_ceiling, it results in a full TLB flush on > every CPU. A rogue process can launch a ret2dir attack only from a > CPU that has dual mapping for its pages in physmap in its TLB. We > can hence defer TLB flush on a CPU until a process that would have > caused a TLB flush is scheduled on that CPU. I have added a cpumask > to task_struct which is then used to post pending TLB flush on CPUs > other than the one a process is running on. This cpumask is checked > when a process migrates to a new CPU and TLB is flushed at that > time. I measured system time for parallel make with unmodified 4.20 > kernel, 4.20 with XPFO patches before this optimization and then > again after applying this optimization. Here are the results: > > Hardware: 96-core Intel Xeon Platinum 8160 CPU @ 2.10GHz, 768 GB RAM > make -j60 all > > 4.20 915.183s > 4.20+XPFO 24129.354s 26.366x > 4.20+XPFO+Deferred flush 1216.987s 1.330xx > > > Hardware: 4-core Intel Core i5-3550 CPU @ 3.30GHz, 8G RAM > make -j4 all > > 4.20 607.671s > 4.20+XPFO 1588.646s 2.614x > 4.20+XPFO+Deferred flush 794.473s 1.307xx Well that's an impressive improvement! Nice work. :) (Are the cpumask improvements possible to be extended to other TLB flushing needs? i.e. could there be other performance gains with that code even for a non-XPFO system?) > 30+% overhead is still very high and there is room for improvement. > Dave Hansen had suggested batch updating TLB entries and Tycho had > created an initial implementation but I have not been able to get > that to work correctly. I am still working on it and I suspect we > will see a noticeable improvement in performance with that. In the > code I added, I post a pending full TLB flush to all other CPUs even > when number of TLB entries being flushed on current CPU does not > exceed tlb_single_page_flush_ceiling. There has to be a better way > to do this. I just haven't found an efficient way to implemented > delayed limited TLB flush on other CPUs. > > I am not entirely sure if switch_mm_irqs_off() is indeed the right > place to perform the pending TLB flush for a CPU. Any feedback on > that will be very helpful. Delaying full TLB flushes on other CPUs > seems to help tremendously, so if there is a better way to implement > the same thing than what I have done in patch 16, I am open to > ideas. Dave, Andy, Ingo, Thomas, does anyone have time to look this over? > Performance with this patch set is good enough to use these as > starting point for further refinement before we merge it into main > kernel, hence RFC. > > Since not flushing stale TLB entries creates a false sense of > security, I would recommend making TLB flush mandatory and eliminate > the "xpfotlbflush" kernel parameter (patch "mm, x86: omit TLB > flushing by default for XPFO page table modifications"). At this point, yes, that does seem to make sense. > What remains to be done beyond this patch series: > > 1. Performance improvements > 2. Remove xpfotlbflush parameter > 3. Re-evaluate the patch "arm64/mm: Add support for XPFO to swiotlb" > from Juerg. I dropped it for now since swiotlb code for ARM has > changed a lot in 4.20. > 4. Extend the patch "xpfo, mm: Defer TLB flushes for non-current > CPUs" to other architectures besides x86. This seems like a good plan. I've put this series in one of my tree so that 0day will find it and grind tests... https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=kspp/xpfo/v7 Thanks! -- Kees Cook