Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp2492381imu; Wed, 21 Nov 2018 12:30:38 -0800 (PST) X-Google-Smtp-Source: AJdET5c7pKT2ss0iU+ttNmkLIFe5Gpt7OosE6yIgwUyHuRHek2zf0IK+hCg0R8inf63KF3/bY4EO X-Received: by 2002:a62:4587:: with SMTP id n7mr8280735pfi.118.1542832238369; Wed, 21 Nov 2018 12:30:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542832238; cv=none; d=google.com; s=arc-20160816; b=XEzPAsrGrUZ5YFhyGYhXgMgHLq1b4XqfUYochRA2/xZKHw0pnywm0+GSWizzvHGNSP M/SHE9Otm+6lwMnVMWPK35lEh7ewA/BY3PUmBHZdmAM/2S6D0tkXulesEm+/2qvqJ3w3 aAhUKciySbpzw2rJkmVetwFVtTCOuEV5CNLEOJd7Zm5s9Vq8iQBSN9y6ucKGHJnLSkjj QHdeybzBQmV2E5WDgGBHICjdK86wG4g0AcdIknp0iRkAzJCbDa0v8SewyHGqkk8W3uVV fVwaDVlkj7qA+FKzTKFueBaqiHYT/xRixMWw/bWsF0LAIUK3RUjCm7NaOreonHGmT3P8 UMPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=ebioLp4ic9tMdxQIR9aEXri8te0F+96neNj4gXWYDP0=; b=V57wj3D64utazIxR9HjzulwclU0RisGUJez5a70/GbR0wqd8M0slyyjGVXwIaes/B3 rzmhBMTr3mdAnOQRvTLIlEImnfZ5G8fj7VxC2W2UTmzRwBhtC5353QxZ5zGhDws7pcRm fJ37bLBhr/xF7e3Bmq2765URXi17b/n2ZwautQPFZUpPp1N8a1YHFSWco6VSbEZfa7ch Jtf9D1BgcZt9rDlnGgKMbhj1oqExc9bGj27ONpRw54QGmJz81l3hO7xAST4V2TB3Huqv GN7DMseLx0bVzSJIX8i3f8KaPcua82BDKUGdUwbj/VuZwaiMRpD8uePzy1zLxyehSsyP 78TQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=LWLRMLEH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z8si45316291pgf.577.2018.11.21.12.30.23; Wed, 21 Nov 2018 12:30:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@amacapital-net.20150623.gappssmtp.com header.s=20150623 header.b=LWLRMLEH; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727263AbeKVFBj (ORCPT + 99 others); Thu, 22 Nov 2018 00:01:39 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:51383 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726210AbeKVFBj (ORCPT ); Thu, 22 Nov 2018 00:01:39 -0500 Received: by mail-it1-f194.google.com with SMTP id x19so10346228itl.1 for ; Wed, 21 Nov 2018 10:26:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amacapital-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ebioLp4ic9tMdxQIR9aEXri8te0F+96neNj4gXWYDP0=; b=LWLRMLEHe0fvuBu5kOziGTBLQKbvwco7PqMfTS+g04VaLvvK2pKBwlKHchxPr5+HmZ 68y2vhgUFMBeTJN3maWvzW4FdSvedkhxhlPAXCuU8H+z/wHmEdBg1e97qA8VxjtZJd/B AGg8Vzo2CIDZI8zMTniwGjkvWGWTVXVcFsH9RD/acXQeLFiERQqkVCsgXUJHiQe2G8KG ts3exwR5Z9NbvxUCpU2FXv3l7RscSqLyNEpWQoOk6WTxhbZqRkBKQIeIdA6LshvBmCuF d0TCMj8FSyuYDq+pLqLJtLYwuJj01f12Jx2MSMBZlpWjjGPyuNOvpkMmkDzRKb1cZjbb c8cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=ebioLp4ic9tMdxQIR9aEXri8te0F+96neNj4gXWYDP0=; b=dg3c2pqIZZ0nJ0jd3tJveTje2ymcfiB3ECCCMCN19uliDozgDrFupmNfgynWjcZovc 98uBEaIOm83pJMFY5U4dQm8FwL2zOw65E45g2sEqH9PiXkqJiQcXbDt6TmaLqjPaAo2j u8k6Rwj4F+Z17UzUHUXgeYtK5/EUHo2LeKdYkWPv3MnfpRJdGINwGO3843JrlEdtWoec Ma09BvbAGmROpA+VmammP7qIDHxn13TniYWSTslGYAERGJVqlzJQKzNHtBnU2a8ysn2q 4lira71QyiHI4wEn38dn6QJBttOP5LQrMyA/g0+6dBVz7ip0uCQGE1Gv0haNT2clBfIZ ZGLA== X-Gm-Message-State: AA+aEWahDJF0FcLSiBkGVMYWr7IXSqiC11UqJyLuynxQFEpcHAl5lwBq gUt9fx0ebK8Enk41+wb5vR3vyw== X-Received: by 2002:a02:b697:: with SMTP id i23mr6548339jam.119.1542824772214; Wed, 21 Nov 2018 10:26:12 -0800 (PST) Received: from ?IPv6:2600:100e:b040:b126:ec78:48dd:44e2:e165? ([2600:100e:b040:b126:ec78:48dd:44e2:e165]) by smtp.gmail.com with ESMTPSA id f7-v6sm14510477iog.30.2018.11.21.10.26.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Nov 2018 10:26:11 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (1.0) Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes From: Andy Lutomirski X-Mailer: iPhone Mail (16A404) In-Reply-To: <658cdb28-e3e5-c0af-368f-c26daf9986ac@kernel.dk> Date: Wed, 21 Nov 2018 11:26:09 -0700 Cc: Linus Torvalds , pabeni@redhat.com, Ingo Molnar , Thomas Gleixner , Ingo Molnar , bp@alien8.de, Peter Anvin , the arch/x86 maintainers , Andrew Morton , Andrew Lutomirski , Peter Zijlstra , dvlasenk@redhat.com, brgerst@gmail.com, Linux List Kernel Mailing Content-Transfer-Encoding: quoted-printable Message-Id: <9E7DFB44-8A2A-48CF-972E-6CB5122CCA20@amacapital.net> References: <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk> <20181121063609.GA109082@gmail.com> <48e27a3a-2bb2-ff41-3512-8aeb3fd59e57@kernel.dk> <1c22125bb5d22c2dcd686d0d3b390f115894f746.camel@redhat.com> <658cdb28-e3e5-c0af-368f-c26daf9986ac@kernel.dk> To: Jens Axboe , dave.hansen@intel.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Nov 21, 2018, at 11:04 AM, Jens Axboe wrote: >=20 >> On 11/21/18 10:27 AM, Linus Torvalds wrote: >>> On Wed, Nov 21, 2018 at 5:45 AM Paolo Abeni wrote: >>>=20 >>> In my experiments 64 bytes was the break even point for all the CPUs I >>> had handy, but I guess that may change with other models. >>=20 >> Note that experiments with memcpy speed are almost invariably broken. >> microbenchmarks don't show the impact of I$, but they also don't show >> the impact of _behavior_. >>=20 >> For example, there might be things like "repeat strings do cacheline >> optimizations" that end up meaning that cachelines stay in L2, for >> example, and are never brought into L1. That can be a really good >> thing, but it can also mean that now the result isn't as close to the >> CPU, and the subsequent use of the cacheline can be costlier. >=20 > Totally agree, which is why all my testing was NOT microbenchmarking. >=20 >> I say "go for upping the limit to 128 bytes". >=20 > See below... >=20 >> That said, if the aio user copy is _so_ critical that it's this >> noticeable, there may be other issues. Sometimes _real_ cost of small >> user copies is often the STAC/CLAC, more so than the "rep movs". >>=20 >> It would be interesting to know exactly which copy it is that matters >> so much... *inlining* the erms case might show that nicely in >> profiles. >=20 > Oh I totally agree, which is why I since went a different route. The > copy that matters is the copy_from_user() of the iocb, which is 64 > bytes. Even for 4k IOs, copying 64b per IO is somewhat counter > productive for O_DIRECT. Can we maybe use this as an excuse to ask for some reasonable instructions t= o access user memory? Intel already did all the dirty work of giving someth= ing resembling sane semantics for the kernel doing a user-privileged access w= ith WRUSS. How about WRUSER, RDUSER, and maybe even the REP variants? And I= suppose LOCK CMPXCHGUSER. Or Intel could try to make STAC and CLAC be genuinely fast (0 or 1 cycles an= d no stalls *ought* to be possible if it were handled in the front end, as l= ong as there aren=E2=80=99t any PUSHF or POPF instructions in the pipeline).= As it stands, I assume that both instructions prevent any following memory= accesses from starting until they retire, and they might even be nastily mi= crocoded to handle the overloading of AC.