Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp2813330rdb; Wed, 15 Nov 2023 11:10:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFDu8dyrNhvQuBRDcdS3c4jAfDl/ZiWk4hMJjU6lVOXJulVboyVlfZFP1qrdJxC991sBsRD X-Received: by 2002:a05:6a00:80c9:b0:68e:41e9:10be with SMTP id ei9-20020a056a0080c900b0068e41e910bemr9548906pfb.20.1700075442826; Wed, 15 Nov 2023 11:10:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700075442; cv=none; d=google.com; s=arc-20160816; b=w9g+DSV2BWzf22YnWJII5V35vnPOxKkvEPCw1ugxqZHZHhdE5EZnpneJQvyHFB+Qew GvySfYNepwR+tbSCRAmomzVIfQYKnkkrKAkqIFJ1LwrwPmpww8tKFrbESqcqKLLR7L1W ErRmswfX2zGbgP+oZXtTqp1xovn2Qjfb76/6gFB2GqyFn8HR1ohmiVhvYdccT9nwGc1j B0xu/cXWCW9MZ8PfIFgt8NUyXlAa7RcwCD25KUzMJyHwzAmuuuUK4eUj3woMl05G+2aX VMoTXUbc/q93ngMzq9JWQRyvaN8j8yyPyEezOFhpBA+9A4RW3Jj3cx3Z23xlMMlH5YC6 SsMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=t3Gj/lDYAeGknhoabh8oIVCV+PKddkxsuVMwfg7hA2k=; fh=RB/ZlumVm972zDIjxZjY7vSMUZow2KSaYK3pIi10DDM=; b=kQc9bss7iSVBqrHxu8dOr4infiooJ9cRol0P0ZkmQLsjMqZ/g4QFOcrlVertLuEllm VbWyZZ4hIs5ynZq25eCHUawn34xr73CPNaWAh+iYs1S6BayfrG7TsrHQIdMfeoid83Mi FlIubs2VsnYCPfAj0d3hlVGq+8KVEXH4Hb2OOBaElBgd3A3dstG1AeGb5wSveEGmKMli GW7S7VM3mfPlVj6Vi84PnmBRV0dXDmEn4CXnLOME4iVkn3GPW2w+sPUO9yjiG5Ioq7hr UZ3TTCtDExIIE0jpkXpAdvN8qhN2m7yAGXnG0TorM8Wf9Cf0sijZejWjvge4qmqpG7O7 xaWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SOF6pTR0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id q12-20020a056a00150c00b006b1fc88d095si10476325pfu.71.2023.11.15.11.10.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Nov 2023 11:10:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b=SOF6pTR0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 59DCC807752B; Wed, 15 Nov 2023 11:10:20 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229558AbjKOTKF (ORCPT + 99 others); Wed, 15 Nov 2023 14:10:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229505AbjKOTKE (ORCPT ); Wed, 15 Nov 2023 14:10:04 -0500 Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45245E5 for ; Wed, 15 Nov 2023 11:10:01 -0800 (PST) Received: by mail-ej1-x62b.google.com with SMTP id a640c23a62f3a-9f27af23443so6541566b.0 for ; Wed, 15 Nov 2023 11:10:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; t=1700075399; x=1700680199; darn=vger.kernel.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=t3Gj/lDYAeGknhoabh8oIVCV+PKddkxsuVMwfg7hA2k=; b=SOF6pTR0SYdBwGdDvZb+wZskTsBttY1Fm3z75xCvfHTDeqVm8lub2GNBS9VErlYdnV ceHaytz7vwKmUHNfhC5Q3if7ucuaunqoj/XppIoNvE7prLXIT7j0e8/Cnh+zJPkYbCW1 s4zyiyIVquw2kb6nWIfPi8SrZ2yj7x7avRSHA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700075399; x=1700680199; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=t3Gj/lDYAeGknhoabh8oIVCV+PKddkxsuVMwfg7hA2k=; b=SSKM5jrFwhi57WzrjX8+QqVI1KVJUIvsZPHzyfDVZTEqA62jqT9zz5dW/Up/M+u8lA MSFVcr6jPB4fHeAvybEWjkqHkt1lBAu06Vt8kmL4lEjHW212OoZtHnswywRpsyKlJdZX hLgEVeNtLAhuEBgq5Sc5wuS1h2pISt+nztaUpQyOxOmUbRgGMLxwLjJHlQKLP2OMp97r DpT0mumuJXnuLekwrUQembrWSToLyKsAjWGNppoqmVOorw6BKFrWanvahYnh6giPImBa +mKcHR/ti0IBw436FPPXmWRfCEO6NE7XuixUV8bd5PdvSMD1M0ARP+2CnZ0nKmshm58r zWCA== X-Gm-Message-State: AOJu0YxSm9Oz8cY0EZeN9T3C+f4rFU3Ekr9LraxTr+N5vv46XyILyegi B2ksdhRmhcTbEtBd4BS1EmTY10mZEHBGe/cpTAF0V+R/ X-Received: by 2002:a17:906:6bda:b0:9be:481c:60bf with SMTP id t26-20020a1709066bda00b009be481c60bfmr8821837ejs.55.1700075399314; Wed, 15 Nov 2023 11:09:59 -0800 (PST) Received: from mail-ed1-f44.google.com (mail-ed1-f44.google.com. [209.85.208.44]) by smtp.gmail.com with ESMTPSA id u14-20020a1709064ace00b0099bccb03eadsm7391986ejt.205.2023.11.15.11.09.57 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 15 Nov 2023 11:09:58 -0800 (PST) Received: by mail-ed1-f44.google.com with SMTP id 4fb4d7f45d1cf-53e07db272cso29733a12.3 for ; Wed, 15 Nov 2023 11:09:57 -0800 (PST) X-Received: by 2002:aa7:ca4a:0:b0:540:eb72:baae with SMTP id j10-20020aa7ca4a000000b00540eb72baaemr8814139edt.40.1700075397646; Wed, 15 Nov 2023 11:09:57 -0800 (PST) MIME-Version: 1.0 References: <202311061616.cd495695-oliver.sang@intel.com> <3865842.1700061614@warthog.procyon.org.uk> <4007890.1700073334@warthog.procyon.org.uk> In-Reply-To: From: Linus Torvalds Date: Wed, 15 Nov 2023 14:09:40 -0500 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [linus:master] [iov_iter] c9eec08bac: vm-scalability.throughput -16.9% regression To: David Howells Cc: kernel test robot , oe-lkp@lists.linux.dev, lkp@intel.com, linux-kernel@vger.kernel.org, Christian Brauner , Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , David Laight , ying.huang@intel.com, feng.tang@intel.com, fengwei.yin@intel.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,LOTS_OF_MONEY, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Wed, 15 Nov 2023 11:10:20 -0800 (PST) On Wed, 15 Nov 2023 at 13:45, Linus Torvalds wrote: > > Do you perhaps have CONFIG_CC_OPTIMIZE_FOR_SIZE set? That makes gcc > use "rep movsb" - even for small copies that most definitely should > *not* use "rep movsb". Just to give some background an an example: __builtin_memcpy(dst, src, 24); with -O2 is done as three 64-bit move instructions (well, three in both direction, so six instructions total), and with -Os you get movl $6, %ecx rep movsl instead. And no, this isn't all that uncommon, because things like the above is what happens when you copy a small structure around. And that "rep movsl" is indeed nice and small, but it's truly horrendously bad from a performance angle on most cores, compared to the six instructions that can schedule nicely and take a cycle or two. There are some other cases of similar "-Os generates unacceptable code". For example, dividing by a constant - when you use -Os, gcc thinks that it's perfectly fine to actually generate a divide instruction, because it is indeed small. But in most cases you really *really* want to use a "multiply by reciprocal" even though it generates bigger code. Again, it ends up depending on microarchitecture, and modern cores tend to do better on divides, but it's another of those things where saving a copuple of bytes of code space is not the right choice if it means that you use a slow divider. And again, those "divide by constant" often happen in implicit contexts (ie the constant may be the size of a structure, and the divide is due to taking a pointer difference). Let's say you have a structure that isn't a power of two, but is (to pick a random but not unlikely value) is 56 bytes in size. The code generation for -O2 is (value in %rdi) movabsq $2635249153387078803, %rax shrq $3, %rdi mulq %rdi and for -Os you get (value in %rax): movl $56, %ecx xorl %edx, %edx divq %rcx and that 'divq' is certainly again smaller and more obvious, but again we're talking "single cycles" vs "potentially 50+ cycles" depending on uarch. Linus