Received: by 2002:a05:7412:f584:b0:e2:908c:2ebd with SMTP id eh4csp2046399rdb; Tue, 5 Sep 2023 12:38:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGIhmQ7e2SXd2vjZc95rXKp+wrUbYaFhRBIJmfSg+30YWdOmbOPsBHQkwoeVteBuYHOamY3 X-Received: by 2002:a2e:8ed9:0:b0:2bc:ff80:f639 with SMTP id e25-20020a2e8ed9000000b002bcff80f639mr550243ljl.7.1693942709173; Tue, 05 Sep 2023 12:38:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693942709; cv=none; d=google.com; s=arc-20160816; b=xAZaBkSeBtrWSZsWKhvnDJUYDaxUsbEVt2EpwtPIdJfZIjbB8czuU0K8Cc48XvFULD FEmdUU51npKPIMHug5SxPV1lFyuL407OwW9xvw8Y4/JMfRb7b4mvzTKh1iFo2HO7JCsH JctvH9LguAPsYNqgc15kjVV0fVpwuExjQYZy1AoSHbFMtkkIkPLjZAw0n9QG3Quul+pD adGKQJLHf9cvf/kc2lvNw/PTel7XtLqKpXHrESgfQAv8PA1fdVRTvrx00IsxC5Gwe576 RUzHfCCtw18DrAYPF3ptOkUXBXsCbVBZ3BxGfJE/Rg8DkTbr4Uqq2/saYBwJYhJS1wnB xRBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=n2mA2ydQomgZsKTTiqvo318eNaSK8hdle811H5tXZFA=; fh=Pel0kHbxi18QPcaXeo903tk6aaXEG32gJAfH3zxky5I=; b=yRGMIYjDndTbWZ3kUS9YTeMQ6sMf2MltCjA5jkN9JNuif2ABJPpyBxZLdWcLNqJHx0 M1xSoSnl2rapdr/dXaVFCCJRn7cxv4AD5COiI4lng2CkJa5l9Shp8a6KSeFXuipOz+vb ASq4JelwXEyr5k1KcPNwvtbkEqSJtxj7jI6MsxTGFwMHOrAf4SRuuIHq4x3ycPhzQSz5 trCP7Qpw2g+U1VELiREJT8VlWR4vaItyvGwqz6ezvcKB+PKLoQROy5KDgL456rg8Z4Cm v5rEXtSwZQhk6X++0S54SgIlZIqEbX9UeSp/004Oxo9IncTUXnE7u0kSAqfZXrP8YqA0 iHzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gnuweeb.org header.s=default header.b=h6Zh8YGH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnuweeb.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gy21-20020a170906f25500b009936afb1e23si7817064ejb.130.2023.09.05.12.38.27; Tue, 05 Sep 2023 12:38:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gnuweeb.org header.s=default header.b=h6Zh8YGH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnuweeb.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242066AbjIAH1n (ORCPT + 3 others); Fri, 1 Sep 2023 03:27:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229632AbjIAH1n (ORCPT ); Fri, 1 Sep 2023 03:27:43 -0400 Received: from gnuweeb.org (gnuweeb.org [51.81.211.47]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 196E910CA for ; Fri, 1 Sep 2023 00:27:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1693553259; bh=ksi0N4bkY4vYdPJRy7584Sb6QNv7wTjPKtpgW3gRMDI=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=h6Zh8YGHc9wyCl7XeR5kfndRlwlkWmvBho3inO3tFxWmIWGaELZFCfKpXP/+pnve4 wDGRj8NEcD4y9gPNiy9N8MfQmp6ih+SbDBTTK3SFXJXN2hGzLEYzLdcXHU8toRP1L4 m4PzIvQZMk6RPhYCdb2RTraXXQrLWsMKy9Lb+aJpNLMy4Qgu231GJaPuAp4VTQSlbd BkGH3OPSXtyimXKDqOSQidJfWrlidoFSVQFVv42x0aaznSu9KqyFWOiz94p2cGn0GC 7noDMvDolP8/TRls/WR57aPYhlaSkm9qqVGgZ1LfkR9ASVC673VE5vMxJCC3xaicAu ZHnBLDbvFnneQ== Received: from biznet-home.integral.gnuweeb.org (unknown [182.253.126.208]) by gnuweeb.org (Postfix) with ESMTPSA id A86D424B367; Fri, 1 Sep 2023 14:27:36 +0700 (WIB) Date: Fri, 1 Sep 2023 14:27:28 +0700 From: Ammar Faizi To: Willy Tarreau Cc: Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Nicholas Rosenberg , Alviro Iskandar Setiawan , Michael William Jonathan , GNU/Weeb Mailing List , Linux Kernel Mailing List Subject: Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Message-ID: References: <20230830135726.1939997-1-ammarfaizi2@gnuweeb.org> <20230830135726.1939997-4-ammarfaizi2@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bpl: hUx9VaHkTWcLO7S8CQCslj6OzqBx2hfLChRz45nPESx5VSB/xuJQVOKOB1zSXE3yc9ntP27bV1M1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LOTS_OF_MONEY,MONEY_NOHTML, SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 01, 2023 at 05:35:08AM +0200, Willy Tarreau wrote: > On Fri, Sep 01, 2023 at 10:24:42AM +0700, Ammar Faizi wrote: > > After thinking about this more, I think I'll drop the memcmp() patch > > because it will prevent optimization when comparing a small value. > > > > For example, without __asm__: > > > > memcmp(var, "abcd", 4); > > > > may compile to: > > > > cmpl $0x64636261, %reg > > ...something... > > > > But with __asm__, the compiler can't do that. Thus, it's not worth > > optimizing the memcmp() in this case. > > Ah you're totally right! So, it turns out that such assumption is wrong. The compiler cannot optimize the current memcmp() into that. I just posted a question on SO: https://stackoverflow.com/questions/77020562/what-prevents-the-compiler-from-optimizing-a-hand-written-memcmp Given: ``` bool test_data(void *data) { return memcmp(data, "abcd", 4) == 0; } ``` The result when using default the memcmp (good): ``` test_data: cmpl $1684234849, (%rdi) sete %al ret ``` The result when using nolibc memcmp() (bad): ``` test_data: cmpb $97, (%rdi) jne .L5 cmpb $98, 1(%rdi) jne .L5 cmpb $99, 2(%rdi) jne .L5 cmpb $100, 3(%rdi) sete %al ret .L5: xorl %eax, %eax ret ``` Link: https://godbolt.org/z/TT94r3bvf This is because apart from the input length, the current nolibc `memcmp()` must stop comparing the next byte if it finds a non-match byte. Imagine what happens if we call: ``` char xstr[] = {'a', 'b', 'x'}; test_data(x); ``` In that case, the compiler may read past xstr if it uses a dword cmp, it can also lead to segfault in particular circumstances using a dword cmp. What the current nolibc memcmp() does from the C language view: 1) Compare one byte at a time. 2) Must stop comparing the next byte if it finds a non-match byte. Because point (2) comes in, the compiler is not allowed to optimize nolibc memcmp() into a wider load; otherwise, it may hit a segfault. That also means it cannot vectorize the memcmp() loop. On the other hand, memcpy() and memset() don't have such a restriction so they can vectorize. The real memcmp() assumes that both sources are at least `n` length in size, allowing for a wider load. The current nolibc memcmp() implementation doesn't reflect that assumption in the C code. IOW, the real built-in memcmp() is undefined behavior for this code: ``` char x = 'q'; return memcmp(&x, "abcd", 4); ``` but the current nolibc memcmp() is well-defined behavior (well, must be, as what the C code reflects). We can improve nolibc memcmp() by casting the sources to a wider type like (ulong, uint, ushort). But that's another story for another RFC patchset. -- Ammar Faizi