Received: by 10.213.65.68 with SMTP id h4csp43010imn; Mon, 19 Mar 2018 18:58:22 -0700 (PDT) X-Google-Smtp-Source: AG47ELtxa7uRkUfcT4vWM/tS1o1v7Z1reHcnL7i3k5zVE1DPgN1/FcWQtTCvSvcavv3ukmvEWIdM X-Received: by 10.99.60.6 with SMTP id j6mr10396649pga.73.1521511102384; Mon, 19 Mar 2018 18:58:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521511102; cv=none; d=google.com; s=arc-20160816; b=vsQT3EHHmYw0PHIipY+FaMzmcaqrQw1Xm6nA5jYgk/EfR6YfsWiAqnVt3CUqRiHD3w OBY656ctQa4Q/XGULpjrD65hGNj0cAVIBdWjRdzxW15iTchQGZhAYFarA3pJZSVuqWyX 9I2uUJCqKo00blD1JfGn6qVs+yJYKdu4HNCOCKbAv8BD1usP26l1TbkN30+4DNwi9fuZ Ov/v9NfNvYFMbV+FgCgfS/b0vMneZMpbcxvJVkmGnXM4pQBk0JOMQmv247OGlVD0rNKy AyEUfSjz8zv/VHRO3gokWoL53OHiUU6eq3e6AEYZ2DBlZxImOA5ZRFD4PN1rrDjzu5pe 2hHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=XjScfU3/1WUSaHgxJ3UY294dMbowOkMIEIfIwWdIq1I=; b=IohkUZShjgf98d6iuqMeSitx7noqzQVdUZqyJX7heQHAPjHe2oWYBVxUt/Ef8ID4fO vDpcqL99qbZn7W8OkXgOJwufUmEmhHbh3X35kDhz5QMIdYKZBZj/Yy85uVor5xomfhh0 16Pc+fZEP2Ype+j3tz8CkWVYjYOCC5AclmBO6486nPv4ex4npmzZuxoz6mvFjrhO4BfQ LG3WVEhnIxEfMb2yelEkO9Rr9wfJARZ2sIpxzzBp9f45/2eIQSRl7vqnF1H+jrmxa3ix i17z/zmsZlTVe/34kzhjq8J4jJ4bQ8DsMCT4Q5w2ROUwh/Ye4AuihE1K7wy1hp9BtUGF 4s8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=WgAPbgEb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a5-v6si459398pll.348.2018.03.19.18.58.08; Mon, 19 Mar 2018 18:58:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@eng.ucsd.edu header.s=google header.b=WgAPbgEb; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S972164AbeCSWAm (ORCPT + 99 others); Mon, 19 Mar 2018 18:00:42 -0400 Received: from mail-it0-f66.google.com ([209.85.214.66]:38339 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934095AbeCSV7v (ORCPT ); Mon, 19 Mar 2018 17:59:51 -0400 Received: by mail-it0-f66.google.com with SMTP id 19-v6so7922745itw.3 for ; Mon, 19 Mar 2018 14:59:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eng.ucsd.edu; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=XjScfU3/1WUSaHgxJ3UY294dMbowOkMIEIfIwWdIq1I=; b=WgAPbgEbyTvMPygl03uQ0LOCcmCq/fifZFp48Vfi9aH3pjmOmuKXpvKnWeq5QnBVWy P7x0jESJVFeCqpQWAZ+dJyOkk5Iusmz6C51GmWd1Zi/RASlqw03YM8pDWR4gjmVTgleb KoH+HTTEGNb8iPySHJrMfW5e4h205mb6y1YnI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=XjScfU3/1WUSaHgxJ3UY294dMbowOkMIEIfIwWdIq1I=; b=q3Fz1PtnjZpLCYEegNAxFzUS6u2/B/TV7SmjoCP1U1kYW99Yw4ReUiwhrNwhuwkzBo OinEppTu49UtGHoOO4ky6s3BuWHF6FWNopZsFwGQowzOBzYpJnC0yrqu6gkhmqwIMbMi icKFDw6MD7DOwyFrF4bFAj0a3TfIfxRoIwFhb6v0iCwrMH4uwk6ir/YP6ZF2QiUiykCr bnVj2gOZBNPrUMo7/psAzmflF6dQbfdoWhdUWlfvJjcGxtuKmDJvt1J1BOGP6vOSsoHw Q6guv5Rseb4RcghVIt9rD4hvnEQn8IsDiiwp4NzabC43D8zhIOLU5Akt/gt2hmCjDnBt tzBw== X-Gm-Message-State: AElRT7H4XkqydRiCmpTxhqYgKhk2s98uDk2A0I0t/KDNOg1/dFiCpWwZ 3ZhAVuRgO/X+oFnbzsIs19dxsNphgNkKtBdWf1YTVg== X-Received: by 2002:a24:1655:: with SMTP id a82-v6mr388171ita.33.1521496790324; Mon, 19 Mar 2018 14:59:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.195.72 with HTTP; Mon, 19 Mar 2018 14:59:49 -0700 (PDT) In-Reply-To: <20180319203021.GA59118@gmail.com> References: <1520705944-6723-1-git-send-email-jix024@eng.ucsd.edu> <1520705944-6723-6-git-send-email-jix024@eng.ucsd.edu> <0924a2b3-6f21-4aaf-224d-2f5accc21d10@gmail.com> <20180311192256.GA630@zzz.localdomain> <20180319203021.GA59118@gmail.com> From: Andiry Xu Date: Mon, 19 Mar 2018 14:59:49 -0700 Message-ID: Subject: Re: [RFC v2 05/83] Add NOVA filesystem definitions and useful helper routines. To: Eric Biggers Cc: Nikolay Borisov , Linux FS Devel , Linux Kernel Mailing List , "linux-nvdimm@lists.01.org" , Dan Williams , "Rudoff, Andy" , coughlan@redhat.com, Steven Swanson , Dave Chinner , Jan Kara , swhiteho@redhat.com, miklos@szeredi.hu, Jian Xu , Andiry Xu , Herbert Xu Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 19, 2018 at 1:30 PM, Eric Biggers wrote: > On Mon, Mar 19, 2018 at 12:39:55PM -0700, Andiry Xu wrote: >> On Sun, Mar 11, 2018 at 12:22 PM, Eric Biggers wrote: >> > On Sun, Mar 11, 2018 at 02:00:13PM +0200, Nikolay Borisov wrote: >> >> [Adding Herbert Xu to CC since he is the maintainer of the crypto subsys >> >> maintainer] >> >> >> >> On 10.03.2018 20:17, Andiry Xu wrote: >> >> >> >> >> >> > +static inline u32 nova_crc32c(u32 crc, const u8 *data, size_t len) >> >> > +{ >> >> > + u8 *ptr = (u8 *) data; >> >> > + u64 acc = crc; /* accumulator, crc32c value in lower 32b */ >> >> > + u32 csum; >> >> > + >> >> > + /* x86 instruction crc32 is part of SSE-4.2 */ >> >> > + if (static_cpu_has(X86_FEATURE_XMM4_2)) { >> >> > + /* This inline assembly implementation should be equivalent >> >> > + * to the kernel's crc32c_intel_le_hw() function used by >> >> > + * crc32c(), but this performs better on test machines. >> >> > + */ >> >> > + while (len > 8) { >> >> > + asm volatile(/* 64b quad words */ >> >> > + "crc32q (%1), %0" >> >> > + : "=r" (acc) >> >> > + : "r" (ptr), "0" (acc) >> >> > + ); >> >> > + ptr += 8; >> >> > + len -= 8; >> >> > + } >> >> > + >> >> > + while (len > 0) { >> >> > + asm volatile(/* trailing bytes */ >> >> > + "crc32b (%1), %0" >> >> > + : "=r" (acc) >> >> > + : "r" (ptr), "0" (acc) >> >> > + ); >> >> > + ptr++; >> >> > + len--; >> >> > + } >> >> > + >> >> > + csum = (u32) acc; >> >> > + } else { >> >> > + /* The kernel's crc32c() function should also detect and use the >> >> > + * crc32 instruction of SSE-4.2. But calling in to this function >> >> > + * is about 3x to 5x slower than the inline assembly version on >> >> > + * some test machines. >> >> >> >> That is really odd. Did you try to characterize why this is the case? Is >> >> it purely the overhead of dispatching to the correct backend function? >> >> That's a rather big performance hit. >> >> >> >> > + */ >> >> > + csum = crc32c(crc, data, len); >> >> > + } >> >> > + >> >> > + return csum; >> >> > +} >> >> > + >> > >> > Are you sure that CONFIG_CRYPTO_CRC32C_INTEL was enabled during your tests and >> > that the accelerated version was being called? Or, perhaps CRC32C_PCL_BREAKEVEN >> > (defined in arch/x86/crypto/crc32c-intel_glue.c) needs to be adjusted. Please >> > don't hack around performance problems like this; if they exist, they need to be >> > fixed for everyone. >> > >> >> I have performed the crc32c test on a Xeon X5647 at 2.93GHz, 14G DDR3 >> memory at 1066MHz platform. >> You are right that enabling CONFIG_CRYPTO_CRC32C_INTEL improves the >> performance significantly. nova_crc32c() is still slightly faster than >> crc32c() with the flag enabled. >> >> Result numbers are follows: data size in bytes, latency in ns, column >> 3 is crc32c() with CONFIG_CRYPTO_CRC32C_INTEL enabled and column 4 >> disabled. >> >> data size (bytes) nova_crc32c() crc32c() -enabled >> crc32c() -disabled >> 64 19 21 56 >> 128 28 29 99 >> 256 46 43 182 >> 512 82 149 354 >> 1024 157 232 728 >> 2048 305 415 1440 >> 4096 603 725 2869 >> > > Probably CRC32C_PCL_BREAKEVEN needs to be adjusted for that CPU, as I suggested > may be the case; notice that your measured speeds are about the same before 512 > (CRC32C_PCL_BREAKEVEN) bytes, but the crypto API version is slower at >= 512 > bytes. It would be possible to set the breakeven point in > crc32c_intel_mod_init() depending on the CPU. Again, if the performance is not > good enough you need to fix it for everyone, not hack around it. > We verify that by setting CRC32C_PCL_BREAKEVEN to 8192, the performance difference between nova_crc32c() and kernel's crc32c() is negligible. Thanks for the comments, and I will use kernel's crc32c() in the next version. Thanks, Andiry