Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp4495769imw; Tue, 12 Jul 2022 08:57:21 -0700 (PDT) X-Google-Smtp-Source: AGRyM1t5XOAB2RuFm6agq1YoUitoQAo74LNKNgjSzlYUdr7EPJnNTUILqBRb2513Zt/5kvDKf8zo X-Received: by 2002:a05:6402:1d4a:b0:43a:ca49:abc6 with SMTP id dz10-20020a0564021d4a00b0043aca49abc6mr18912431edb.376.1657641440862; Tue, 12 Jul 2022 08:57:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657641440; cv=none; d=google.com; s=arc-20160816; b=OSieSEHFEJZxwDa10nljc2QZlkGLLJ2DoqMSs0WnkP5JY5ZQYWA2O+kVV+XMMlzhJF vGkzJUMnWOLLlIcDrgwI+J0YVsU65qYnQUXNiAxqd/GuTEJP4oEIujaLT9lK5BsHjMTo hj3SG/MssqsP3WzuieKVszpj5o/DAbYX6Nnz4CD+O1O6w9PM1pIRFIgpFZUy79yf2QBq Bobq1cRFipNrGg+MVbY2m7gwVb8GRY+wmxbrTspZQHoDw8rGZuDQlFZCLJgns8P98eHj yPDQX/FnTPbGEOYk+fzed8xWQfEFq8vDyVX+/+Be0sPLYfomitENpLD1UBccUNSJ3rVn WGHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=rVxYUPu70x23v27CqTL6O9oHmdkYH1ZfRmRpfUbyCvM=; b=vXwfOj+kAexLAuqa0mzy1BlhWrLSQ3gAHcf86Xg9mMW6yhtLZtHDvYZc3t1OUIjoh8 7f9jpv8e5R80184F/71HRszo0HQmv/eY29TvRUCg3lTNCCbbwWeNGrkU9buu2Ie6BbHo A6nGoPM/2MVJ/gktKitqhlPePGqbYa4hZRcZms4f5/subZKZRtMewflgxJp0FtxpjY+x +oMQBW0O9RJjldNU4VjdNwYqQQJov8nqstLKVCHg5+6KSqcuivLDcK5EuT5NSatobpSf u7LWRtv7oXQD0sXdlceKaK4DWaUx/WR5/Wat2BSfJ+T11g3Cfndxdh65ovPQ6sqpcRqk 8ZHw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=M1QfBgD7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sh14-20020a1709076e8e00b0072aa141095asi16330500ejc.0.2022.07.12.08.56.48; Tue, 12 Jul 2022 08:57:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=M1QfBgD7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbiGLPPw (ORCPT + 99 others); Tue, 12 Jul 2022 11:15:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233908AbiGLPPf (ORCPT ); Tue, 12 Jul 2022 11:15:35 -0400 Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F308CC1FDA for ; Tue, 12 Jul 2022 08:08:43 -0700 (PDT) Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-31c8a1e9e33so84118107b3.5 for ; Tue, 12 Jul 2022 08:08:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=rVxYUPu70x23v27CqTL6O9oHmdkYH1ZfRmRpfUbyCvM=; b=M1QfBgD73UtrxG8I52Nqel8WC41cU8VsHP+DERTznglaNV9+hzcqBS9XvLWDbcieAm 4VZX66r8ZRGqWEC2kekhDfqkF7bPZYcoCFUEO1kwvYTaYDJ+LHhY3aw1n4PqsJKcQ0z+ TprxrIl+zKGETCWiXHGvRqDgsnuEzrFPQua72h1g4BdtExXl1+kXoa4Zmy9j6ZLauAIo TtBy7i7nDuHV0cPxTh2E0msQAaHQs2jABV7yvAFKuqxECiS+cMflyBphZ7AiHGZhP9Hy VfyrQckLldZwo2SjCoR9ropJd4hVuFCgtZFnm6bceC5aqENMz8QeCoK3rXkPqQBhBZC0 Fppg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=rVxYUPu70x23v27CqTL6O9oHmdkYH1ZfRmRpfUbyCvM=; b=eAf6Jond8h8T4zG4i1jDC4mJKHKymIXDSOlVMJgeOYZ0NZDL7Lpcv/KdaSCOXdq5nc 3zWSdopjM2eh93x2+jcoJYfNYTSpFNwSF5gdaaY72S+fDaj6fOAyYlAY47bH2DdC/NdT ucMxvKDlIeh1IGzi86MXD5P/P0thz2Z/LdiwaFUuzldlZvKvFWreum4NsbAM+XK9kDWc RKCOkj+b7ZwI+5Fwr1L0bNaf5CjCrutVhoyF+oO/FKwqhIvju4f5w5Qh1op4PIxU7qrZ pa2xvORh+TnwEKryfPTKNlww3OEiX9jsrLu3wcnAAIqzlNt0PLPJXok93Kxs0dcI5jJG LuUg== X-Gm-Message-State: AJIora/YIeVsS35Alyvx7pi8yo5I1pwbz+mtOiDo5iuFjwe5tC6pqmwC xBng0d98GqlloysX6cvppWAFSojHApC5uTYlPy0= X-Received: by 2002:a81:4986:0:b0:31d:388b:d08d with SMTP id w128-20020a814986000000b0031d388bd08dmr20961683ywa.185.1657638522767; Tue, 12 Jul 2022 08:08:42 -0700 (PDT) MIME-Version: 1.0 References: <20220710142822.52539-1-arthurchang09@gmail.com> <20220710142822.52539-3-arthurchang09@gmail.com> <3a1b50d2-a7aa-3e89-56fe-5d14ef9da22f@gmail.com> <48db247e-f6fd-cb4b-7cc5-455bf26bb153@gmail.com> In-Reply-To: From: Andy Shevchenko Date: Tue, 12 Jul 2022 17:08:06 +0200 Message-ID: Subject: Re: [PATCH 2/2] lib/string.c: Optimize memchr() To: Yu-Jen Chang Cc: Andrey Semashev , Andy Shevchenko , Akinobu Mita , Ching-Chun Huang , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 12, 2022 at 4:58 PM Yu-Jen Chang wrot= e: > Andrey Semashev =E6=96=BC 2022=E5=B9=B47=E6= =9C=8811=E6=97=A5 =E9=80=B1=E4=B8=80 =E6=99=9A=E4=B8=8A11:00=E5=AF=AB=E9=81= =93=EF=BC=9A > > On 7/11/22 17:52, Yu-Jen Chang wrote: > > > Andrey Semashev =E6=96=BC 2022=E5=B9=B47= =E6=9C=8811=E6=97=A5 =E9=80=B1=E4=B8=80 =E5=87=8C=E6=99=A84:01=E5=AF=AB=E9= =81=93=EF=BC=9A > > >> On 7/10/22 17:28, Yu-Jen Chang wrote: ... > > >>> + for (; p <=3D end - 8; p +=3D 8) { > > >>> + val =3D *(u64 *)p ^ mask; > > >> > > >> What if p is not aligned to 8 (or 4 on 32-bit targets) bytes? Not al= l > > >> targets support (efficient) unaligned loads, do they? > > > > > > I think it works if p is not aligned to 8 or 4 bytes. > > > > > > Let's say the string is 10 bytes. The for loop here will search the f= irst > > > 8 bytes. If the target character is in the last 2 bytes, the second f= or > > > loop will find it. It also work like this on 32-bit machine. > > > > I think you're missing the point. Loads at unaligned addresses may not > > be allowed by hardware using conventional load instructions or may be > > inefficient. Given that this memchr implementation is used as a fallbac= k > > when no hardware-specific version is available, you should be > > conservative wrt. hardware capabilities and behavior. You should > > probably have a pre-alignment loop. > > Got it. I add pre-alignment loop. It aligns the address to 8 or 4bytes. Still far from what can be accepted. Have you had a chance to read how strscpy() is implemented? Do you understand why it's done that way? > void *memchr(const void *p, int c, size_t length) > { > u64 mask, val; > const void *end =3D p + length; > c &=3D 0xff; > while ((long ) p & (sizeof(long) - 1)) { > if (p >=3D end) > return NULL; > if (*(unsigned char *)p =3D=3D c) > return (void *) p; > p++; > } > if (p <=3D end - 8) { > mask =3D c; > MEMCHR_MASK_GEN(mask); > > for (; p <=3D end - 8; p +=3D 8) { Why you decided that this code will be run explicitly on 64-bit arch? > val =3D *(u64*)p ^ mask; > if ((val + 0xfefefefefefefeffull) > & (~val & 0x8080808080808080ull)) > break; > } > } > > for (; p < end; p++) > if (*(unsigned char *)p =3D=3D c) > return (void *)p; > > return NULL; > } --=20 With Best Regards, Andy Shevchenko