Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp8930483rwi; Tue, 25 Oct 2022 12:44:51 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7PdTO1+6UoD5tr6EwS48A+DgZq0h6se27mw9M1hW07MkRn2LuhPvKHBoHEOskj02dIpkLw X-Received: by 2002:a17:907:270e:b0:7a6:4212:b01e with SMTP id w14-20020a170907270e00b007a64212b01emr12087418ejk.556.1666727090626; Tue, 25 Oct 2022 12:44:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666727090; cv=none; d=google.com; s=arc-20160816; b=nrKdkFYToPjDkyrlT+i0nWO38B3ynVYdJSt8AhkXN3CtDiq/WGGXBknWMyloHJdDPV Cc9qgTWVRBxa4dj8HKkJ3vpBKXWR5B++wIbCnvp/YItEYrl9C95V5I9sk2Fe7zk3kZn0 gPR4uUpq8RXSX259aKC0D4NFqzS21BWVH3gEqBq8jbALSgjCwhDAi8jqm0AKiHjXqC/9 iKyCAxTeJYgJv0kEGh72u2LS7YSs7JStUcliweT5H64YvoFwKC1x5gzIauolg4cH2mYC I1PhoEfC5ZUFbVuSGdv6+S6XXA4Z+eAN2KMcRUEo4iS8VEXBIemEiVg4Q3HkRSHIs9CQ tkWQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=igLWDGirrWgZD1jONE1rAl6MtupsMv2eh3iF8AkKqtM=; b=CrmKYruNWtiNbRTgygOt0tro1GoTVCvCZXFnA6SvJtoaM2qld0L3N40FHJYbBZOuPY BW00skhlIYigq7+SwC3iaGVxOHUtu/4+K0rxf3jc0j0pRO6vb7WURppdBMDkLAIVtYt8 P7jrx4rOwXbwr62UxtJnSkpNsxC1PlVlzE4wA+LUoj9ju3XqONB+tpWbjoTwW19OkvQ3 Dwo+V0+W/0OVkdsuLEpTD6lyD4F2lOCAZRyRE0uEzqcVTkuYwo9EzN1FwuflSVxBqa79 TWTpLBOBuaaqomXuZXJc5W2q9dhu3bOcT3W24VGdRbJqvUOwk8n8R9sK59F9k6x4s7fU zo+Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hq22-20020a1709073f1600b007a1d4f0e7fcsi3812808ejc.655.2022.10.25.12.44.24; Tue, 25 Oct 2022 12:44:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231640AbiJYTdF (ORCPT + 99 others); Tue, 25 Oct 2022 15:33:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230366AbiJYTdC (ORCPT ); Tue, 25 Oct 2022 15:33:02 -0400 Received: from smtp.smtpout.orange.fr (smtp-11.smtpout.orange.fr [80.12.242.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8EB3DCAE3 for ; Tue, 25 Oct 2022 12:32:52 -0700 (PDT) Received: from [192.168.1.18] ([86.243.100.34]) by smtp.orange.fr with ESMTPA id nPfLoHUnu42kJnPfLovwLq; Tue, 25 Oct 2022 21:32:50 +0200 X-ME-Helo: [192.168.1.18] X-ME-Auth: Y2hyaXN0b3BoZS5qYWlsbGV0QHdhbmFkb28uZnI= X-ME-Date: Tue, 25 Oct 2022 21:32:50 +0200 X-ME-IP: 86.243.100.34 Message-ID: Date: Tue, 25 Oct 2022 21:32:46 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: RE: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match Content-Language: fr, en-US To: Nathan Moinvaziri , Andy Shevchenko Cc: "linux-kernel@vger.kernel.org" References: From: Christophe JAILLET In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le 25/10/2022 à 19:53, Nathan Moinvaziri a écrit : > Hi Andy, > > I appreciate your quick feedback! > > I have done as you suggested and published my results this time using Google benchmark: > https://github.com/nmoinvaz/strcasecmp Hi, the algorithm on github is not the same as the one posted here. IIUC, the one on github is wrong. If you compare 2 strings that are the same, they will have the same length, and "if (c1 == c2) continue;" will go one past the end of the strings. And the result will be <0 or 0 or >0 depending the the char *after* the trailing \0. On the other side, the results of the benchmark on github are likely not accurate with the algorithm posted here, because there is one more test in each loop ("while (c1 != 0)") as long as the 2 strings are the same. On github this test is skipped because you will go through the "continue" CJ > > After you review it, and if you still think the patch is worthwhile then I can fix the other problems you mentioned for the original patch. If you think it is not worth it, then I understand. > > Thanks again, > Nathan > > -----Original Message----- > From: Andy Shevchenko > Sent: Tuesday, October 25, 2022 2:04 AM > To: Nathan Moinvaziri > Cc: linux-kernel@vger.kernel.org > Subject: Re: [PATCH] lib/string.c: Improve strcasecmp speed by not lowering if chars match > > On Tue, Oct 25, 2022 at 11:00:36AM +0300, Andy Shevchenko wrote: >> On Tue, Oct 25, 2022 at 4:46 AM Nathan Moinvaziri wrote: > > ... > >>> When running tests using Quick Benchmark with two matching 256 >>> character strings these changes result in anywhere between ~6-9x speed improvement. >>> >>> * We use unsigned char instead of int similar to strncasecmp. >>> * We only subtract c1 - c2 when they are not equal. > > ... > >> You tell us that this is more preformant, but have not provided the >> numbers. Can we see those, please? > > So, I have read carefully and see the reference to some QuickBenchmark I have no idea about. What I meant here is to have numbers provided by an (open > source) tool (maybe even in-kernel test case) that anybody can test on their machines. You also missed details about how you run, what the data set has been used, etc. > >> Note, that you basically trash CPU cache lines when characters are not >> equal, and before doing that you have a branching. I'm unsure that >> your way is more performant than the original one. > > -- > With Best Regards, > Andy Shevchenko > > >