Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp2552790ioo; Sat, 28 May 2022 18:10:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz1c5Q83JUfzSIPU3J4wKXMWhQq9OiKk1XRrgx4ouABmiGpCorlbHRacYEqtjRNCbgFjjH7 X-Received: by 2002:a17:903:32cc:b0:161:9539:fd69 with SMTP id i12-20020a17090332cc00b001619539fd69mr50193667plr.153.1653786659620; Sat, 28 May 2022 18:10:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1653786659; cv=none; d=google.com; s=arc-20160816; b=cxNMJbRCCoO8UqSBzjLfL81MTPQVRdjiXfmXAiWbNyGLdWbZK/cbz1/MBrdXDbqeWG usQ4jnnHI+oIVz9q1WOAnY8jA5m9OdecQKsM3CJZE9axiExyWIT8s22P7fi6u9Gd3FzU OSJeDcznOEuWh4xcaWUqRTA92NewYdN56K+OPDm873kNMS0biUBjbrGG5rWuH+QJGbG1 fJStQxGF1z9GlirjMuDgKXLl9Rc/pid3haZJEfUAksXx8FyFUc7dm8u4opgi5pGvRALH uQadbmqLk2/EcwQh6az9QKoDl2qrXesukslNn4nACk5OeMFVxUnnaX8vA4Z1C/UbSr1w B80Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=swPz736rtMisPwXoYHDFBKONKh2JMJKdnAw3kKsA7z8=; b=Vi7VXZaZPPCSCe/IxjwCrsXIY3TO/h428nU0TejzMjWJtR9FqgYGUSRskHTjS8g1/C ShINXc+JOYk9rufcT+Qkw//zyvkIIYyocRmBm4vHvl0oKFp0zS3ZuuuEfCdFSuLW8hy+ lUncI66HGMDFj1lVHwBbpcx6WlH6UwLbyHS0VN2kFlue9H5c0Pp2bgPfh8Pyl2LYsseN /GL+/JP0p2jZi+nFHzI7VBhvzPclAHULXhq2NbCd8HNEPQgs4xXUbDAzPEPAAE7LrWv7 V3sG/xet7syeYf2AkvozMc/8vKtS0JFAlfaY79x3PcU9rowe+CC2d5S8xRGsvAUZRjy9 ITtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CB3IxajR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e9-20020a656bc9000000b003c2699c8074si11062994pgw.563.2022.05.28.18.10.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 28 May 2022 18:10:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CB3IxajR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EF20A5BD2C; Sat, 28 May 2022 18:10:57 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230203AbiE2BKw (ORCPT + 99 others); Sat, 28 May 2022 21:10:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229987AbiE2BKv (ORCPT ); Sat, 28 May 2022 21:10:51 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6FE645B8AF; Sat, 28 May 2022 18:10:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653786650; x=1685322650; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=M1FBflrhPBegpxmSR33l7oZ4bUcLf85OUp0h8DXBHfQ=; b=CB3IxajRPHJ/SCJ/z8JVrIq7Sm73FnQan7K+vnb/t41/VlOmo4zdMQr5 D9P1eUpMelnHpuEF4sFssVvmeIHzRHTS5wZN43FDVmRtgt2fPgwDc+NYe ZhLWpuSKh4s/PI01aEckpm5VsLAl+NKg/WF9m6c9Q9yvIXAeKJH/OAnRZ eL0l+0ahqc/uDyA6O/+eW3q+XJ9uhb0A68ERr2BNtu7Uv36hX0Ykmy33o tGqRK/dRLwgPHSuYySAO0S5g8KHAefctTF3cM+ni/lYZY3dxvjhy+tVPS sxw5XQrIA3CqKgxtFgF1RRzRCcG85zptf5RsSGYelgAEF7TD4X2L1VmO9 w==; X-IronPort-AV: E=McAfee;i="6400,9594,10361"; a="262350465" X-IronPort-AV: E=Sophos;i="5.91,259,1647327600"; d="scan'208";a="262350465" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2022 18:10:49 -0700 X-IronPort-AV: E=Sophos;i="5.91,259,1647327600"; d="scan'208";a="575408466" Received: from akleen-mobl1.amr.corp.intel.com (HELO [10.209.70.91]) ([10.209.70.91]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2022 18:10:49 -0700 Message-ID: Date: Sat, 28 May 2022 18:10:48 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH 0/2] x86: Optimize memchr() for x86-64 Content-Language: en-US To: Yu-Jen Chang , jdike@linux.intel.com Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, keescook@chromium.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, richard@nod.at, anton.ivanov@cambridgegreys.com, johannes@sipsolutions.net, linux-um@lists.infradead.org, jserv@ccns.ncku.edu.tw References: <20220528081236.3020-1-arthurchang09@gmail.com> From: Andi Kleen In-Reply-To: <20220528081236.3020-1-arthurchang09@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/28/2022 1:12 AM, Yu-Jen Chang wrote: > *** BLURB HERE *** > These patch series add an optimized "memchr()" for x86-64 and > USER-MODE LINUX (UML). > > There exists an assemebly implementation for x86-32. However, > for x86-64, there isn't any optimized version. We implement word-wise > comparison so that 8 characters can be compared at the same time on > x86-64 CPU. The optimized “memchr()” is nearly 4x faster than the > orginal implementation for long strings. > > We test the optimized “memchr()” in UML and also recompile the 5.18 > Kernel with the optimized “memchr()”. They run correctly. > > In this patch we add a new file "string_64.c", which only contains > "memchr()". We can add more optimized string functions in it in the > future. Are there any workloads that care? From a quick grep I don't see any that look performance critical. It would be good to describe what you optimized it for. For example optimization for small input strings is quite different than large strings. I don't know what is more common in the kernel. I assume you ran it through some existing test suites for memchr (like glibc etc.) for correctness testing? (bugs in optimized string functions are often subtle, it might be also worth trying some randomized testing comparing against a known reference) -Andi