Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp1831452rwb; Sun, 6 Aug 2023 04:09:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEVNJf8TdvjCYgW7vk7gC1Y4SgotqRtDSrYclmy+Em4XP3fXOcffGgYZy1BZo/OpgC41Oe+ X-Received: by 2002:a17:90a:7442:b0:268:c6f:3b3d with SMTP id o2-20020a17090a744200b002680c6f3b3dmr6163051pjk.4.1691320156626; Sun, 06 Aug 2023 04:09:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691320156; cv=none; d=google.com; s=arc-20160816; b=LGfFmelu50o9MWA8Y53KrVuiDE9rY3pL4XSoH2KbwvGCYieNTWkI8eXf1zQtRu7Mnp gEZEcdrz3hEE0mY+j26fZp/oNckhYBBtiuBpxBafZs4gK3RRC60YRCn5ZM7berxF6djs 2+8BnCbXaY6LEm3sbzKnGIIXCjOadj3fU6XpgGsiNJ6SxGEtA3Jmlmm3Qo8CxYF3GFUT BXzY6e/ShOLdhWkcnj8jWvcj9tWIs4E8QqWeFXBjsgA25KLBOmyOiQ8N7dNzWpvZzOq5 t/WExQvtLmCTK60WC6JRHbW1jP+Rqzy/2HhCPdM+fEajJTYiESSe2YfK1IoxObB6XWET YGkw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=MBYmPX1Y0hRotsmhOJIY2h+FXNYd94nj87i6l3yt5kA=; fh=hWvSKgORhUBL3eQSpnllPnYtc0jAAywWzmx4zVsB6ZE=; b=ZREwW0Dy7mTEwo2ySHwWK3XVkJ3jYKxGR69n4PSFRW+r3yUmrKb3efmLN+xXGVA1Kw Af/7QCOO6lty+n/ERmjvo9ipymWYSDk8PaZcX6cFqP3DX6Rr3TYD89+WfR+JL6biZ6ov yxJPgv7+mIbU1X2FcfSYVgAXwzGjMAFuZJ1YYgMOvRBQFnQExjaOJHDrfUAT5xnhsOrg M93LLXusUoY6FO+vVb8juvnpMk8KFAP0ov0uOksuAJepUz8SyqIKeulx4szkaxn0ZPvF edJf5uBbuSxUrCU7HjCI9ndrmYcRcakY3h6KkJLT93Qm+UXdq9XUlnV7pKK1zBaxjg04 EaXg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id np13-20020a17090b4c4d00b00263aff4ccf0si4759679pjb.3.2023.08.06.04.09.05; Sun, 06 Aug 2023 04:09:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230160AbjHFJ7E (ORCPT + 99 others); Sun, 6 Aug 2023 05:59:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjHFJ7C (ORCPT ); Sun, 6 Aug 2023 05:59:02 -0400 Received: from 1wt.eu (ded1.1wt.eu [163.172.96.212]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 096A011D; Sun, 6 Aug 2023 02:59:00 -0700 (PDT) Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 3769wk4o012323; Sun, 6 Aug 2023 11:58:46 +0200 Date: Sun, 6 Aug 2023 11:58:46 +0200 From: Willy Tarreau To: Zhangjin Wu Cc: arnd@arndb.de, david.laight@aculab.com, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, thomas@t-8ch.de Subject: Re: [PATCH v6 06/15] tools/nolibc: __sysret: support syscalls who return a pointer Message-ID: <20230806095846.GB10627@1wt.eu> References: <11fd815773f7d202d329ac2c64cd6980b32e4fcf.1688739492.git.falcon@tinylab.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <11fd815773f7d202d329ac2c64cd6980b32e4fcf.1688739492.git.falcon@tinylab.org> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Zhangjin, On Fri, Jul 07, 2023 at 10:56:59PM +0800, Zhangjin Wu wrote: > No official reference states the errno range, here aligns with musl and > glibc and uses [-MAX_ERRNO, -1] instead of all negative ones. > > - musl: src/internal/syscall_ret.c > - glibc: sysdeps/unix/sysv/linux/sysdep.h > > The MAX_ERRNO used by musl and glibc is 4095, just like the one nolibc > defined in tools/include/nolibc/errno.h. > > Suggested-by: Willy Tarreau > Link: https://lore.kernel.org/lkml/ZKKdD%2Fp4UkEavru6@1wt.eu/ > Suggested-by: David Laight > Link: https://lore.kernel.org/linux-riscv/94dd5170929f454fbc0a10a2eb3b108d@AcuMS.aculab.com/ > Signed-off-by: Zhangjin Wu > --- > tools/include/nolibc/sys.h | 17 ++++++++++++----- > 1 file changed, 12 insertions(+), 5 deletions(-) > > diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h > index 53bc3ad6593e..3479f54d7957 100644 > --- a/tools/include/nolibc/sys.h > +++ b/tools/include/nolibc/sys.h > @@ -28,13 +28,20 @@ > #include "errno.h" > #include "types.h" > > -/* Syscall return helper, set errno as -ret when ret < 0 */ > + > +/* Syscall return helper for library routines, set errno as -ret when ret is in > + * range of [-MAX_ERRNO, -1] > + * > + * Note, No official reference states the errno range here aligns with musl > + * (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h) > + */ > + > static __inline__ __attribute__((unused, always_inline)) > -long __sysret(long ret) > +long __sysret(unsigned long ret) > { > - if (ret < 0) { > - SET_ERRNO(-ret); > - ret = -1; > + if (ret >= (unsigned long)-MAX_ERRNO) { > + SET_ERRNO(-(long)ret); > + return -1; > } > return ret; > } While running some regression tests, I found that my programs increased in size by 3-4% overall, which was solely attributed to this helper that significantly increased the size of many syscalls (particularly those returning ints). Let's consider this simple function: void unlink_exit(const char *name) { int ret = unlink(name); exit(ret < 0 ? errno : 0); } Before: $ nm --size unlink.o | grep unli 0000000000000030 T unlink_exit $ objdump -d -j .text --disassemble=unlink_exit unlink.o 000000000000003b : 3b: 48 89 fe mov %rdi,%rsi 3e: b8 07 01 00 00 mov $0x107,%eax 43: 31 d2 xor %edx,%edx 45: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi 4c: 0f 05 syscall 4e: 31 ff xor %edi,%edi 50: 85 c0 test %eax,%eax 52: 79 0a jns 5e 54: 89 c7 mov %eax,%edi 56: f7 df neg %edi 58: 89 3d 00 00 00 00 mov %edi,0x0(%rip) # 5e 5e: b8 3c 00 00 00 mov $0x3c,%eax 63: 40 0f b6 ff movzbl %dil,%edi 67: 0f 05 syscall 69: eb fe jmp 69 After: $ nm --size unlink.o | grep unli 0000000000000042 T unlink_exit $ objdump -d -j .text --disassemble=unlink_exit unlink.o 0000000000000051 : 51: 48 89 fe mov %rdi,%rsi 54: b8 07 01 00 00 mov $0x107,%eax 59: 31 d2 xor %edx,%edx 5b: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi 62: 0f 05 syscall 64: 48 63 d0 movslq %eax,%rdx 67: 48 81 fa 00 f0 ff ff cmp $0xfffffffffffff000,%rdx 6e: 76 0a jbe 7a 70: f7 da neg %edx 72: 89 15 00 00 00 00 mov %edx,0x0(%rip) # 78 78: eb 06 jmp 80 7a: 31 ff xor %edi,%edi 7c: 85 c0 test %eax,%eax 7e: 79 06 jns 86 80: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # 86 86: b8 3c 00 00 00 mov $0x3c,%eax 8b: 40 0f b6 ff movzbl %dil,%edi 8f: 0f 05 syscall 91: eb fe jmp 91 => that's 18 bytes added to retrieve the result of a syscall. There are several reasons involved: - the "unsigned long" argument to __sysret() forces a sign extension from all sys_* functions that used to return int (the movslq above). - the comparison with the error range now has to be performed on a long instead of an int - the return value from __sysret() is a long (note, a signed long) which then has to be turned back to an int before being returned by the caller to satisfy the caller's prototype. I could recover a part of it by replacing the __sysret() function with a macro that preserves the input type and avoids these useless conversions: #define __sysret(arg) ({ \ typeof(arg) __sysret_arg = (arg); \ if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \ SET_ERRNO(-(int)__sysret_arg); \ __sysret_arg = -1L; \ } \ __sysret_arg; \ }) But the remaining part is the comparison to -MAX_ERRNO inflicted on all integer returns where we could previously keep a simple sign comparison: 51: 48 89 fe mov %rdi,%rsi 54: b8 07 01 00 00 mov $0x107,%eax 59: 31 d2 xor %edx,%edx 5b: 48 c7 c7 9c ff ff ff mov $0xffffffffffffff9c,%rdi 62: 0f 05 syscall 64: 3d 00 f0 ff ff cmp $0xfffff000,%eax 69: 76 0a jbe 75 6b: f7 d8 neg %eax 6d: 89 05 00 00 00 00 mov %eax,0x0(%rip) # 73 73: eb 06 jmp 7b 75: 31 ff xor %edi,%edi 77: 85 c0 test %eax,%eax 79: 79 06 jns 81 7b: 8b 3d 00 00 00 00 mov 0x0(%rip),%edi # 81 81: b8 3c 00 00 00 mov $0x3c,%eax 86: 40 0f b6 ff movzbl %dil,%edi 8a: 0f 05 syscall 8c: eb fe jmp 8c And given that the vast majority of the syscalls return integers, I think we should specialize these sysret functions so that we don't add needless complexity for all those for which we know they're returning ints (it's written in the caller's prototype anyway). I.e. we can have __sysret_int() that is the low-overhead version and keep __sysret() the expensive one doing the comparison (and possibly through the macro like above if needed in order to avoid multiple casts). I'm not going to change all that now, that's too late, but I'm a bit sad to see my binaries inflate just because of this, so I hope we can get back to this later after the current queue is flushed. Regards, Willy