Received: by 10.223.185.116 with SMTP id b49csp4950787wrg; Tue, 27 Feb 2018 05:30:30 -0800 (PST) X-Google-Smtp-Source: AH8x227xXfgbNYoN/RROPIodSBMTFH5SIu+wQ1XVbDX91aXfG0E8FBZ2KGqE2Kc3zD2ddToeORx7 X-Received: by 10.101.90.140 with SMTP id c12mr10142190pgt.56.1519738230782; Tue, 27 Feb 2018 05:30:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1519738230; cv=none; d=google.com; s=arc-20160816; b=I39wfRYO+OiK8E2LgZYi7Y6DHPVrdlu8XKYkkZRnSz6oFdpK1/RrNnmQneKzhtGIUe rnKEznS5/ECQRMMnbuJYO2dCLSQkNr1bnplIRjV8z8Rdf/fjsi3+RffYR3OabthaAZuR J9d8pwfNOxbfSdi8PeOmL9coJdUqeBDyn7BAFZMDMBrFViUEu8ugJd4Y6ZJjoxL6JAbk Zjkluwte0vIE1qNToso9XNL6p9JbrsSe1pm3hOoh0g65e2+MjBntD3TSBbmkNDf14uXk gSBSWoy3P6l72+mI612WYFnS6u2ptJbTUe8MEnRr17s6NtnJAd78WiyiPh9H86UiJni0 Apxw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:date:cc:to:from:subject:message-id :arc-authentication-results; bh=3QOg+4J45kFWMYG3i+go9z9CBexiwlXbV1fdBHrbE6U=; b=Dp5FvaSRDqodWB+TcZwSlDkqvRinZmKVa1BpJwlhnBUOVvQfq5kvUALNEjMC1k4rok b3N9fJlNjOJENQMAzyu+ICNP+JfRtb5BOKACtWvxi0/2/ZU9X8DNZ04Zna+dVJgd/J3x grdqzNP0rVhhRdhVdeL5AzRto7SLzFgIdwEIRmQV5jmLmq7JIgMt6ggHMk8CTnONeTF4 m8+rEe3JW/a7m8Z6AHVI+F2cyrB8Wv61nNrng2CWq485V8gfdenlLBZKmzDFYJrPNfyl NLsIEJxTOnsPcl8btEycNgvcAKKMruVSqMbVsw+k2o0r8NodGFNP/FgRPAWR0HsZl9so 83mg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f32-v6si8507544plf.754.2018.02.27.05.30.15; Tue, 27 Feb 2018 05:30:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932128AbeB0N3N (ORCPT + 99 others); Tue, 27 Feb 2018 08:29:13 -0500 Received: from mail-qt0-f195.google.com ([209.85.216.195]:42965 "EHLO mail-qt0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932112AbeB0N3M (ORCPT ); Tue, 27 Feb 2018 08:29:12 -0500 Received: by mail-qt0-f195.google.com with SMTP id t6so13611637qtn.9 for ; Tue, 27 Feb 2018 05:29:12 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=3QOg+4J45kFWMYG3i+go9z9CBexiwlXbV1fdBHrbE6U=; b=aiyBIGfvRjb/CyLKSvRAE0YsA7fVHy690C+klj757hlE8j2pG/GGbozBdf8yYL7cYf omUn4vMqsCIJum+TyQCZnTM2vsl9Glty42RqDstH4noPqW5bncKlFhpdadwEgXvPY+To g43hvPqoHGpqbfmMNo3mXg0brWqbEOQZMZc1pZpx6WCUEkLOAxdYeKd8u+M/28KJmeRD 17G1bZjKJXxrjL5GTRT7nMHTjruIathn/ZIx8y7vbiS0WmKI3DlHODsSmKZrK023g9FT mc3Tberq7HFY9CTZUCFmcC5LFjVaL8rfOVxWyEyRtlJswSb6SQHHI4g3Rei9NTMHgWHF 0xEQ== X-Gm-Message-State: APf1xPDdKQFKyjdruM86Rd06LPW1TDME45xhdTxEXler3HRmGhBmQ5DN 40X6lkMMWj3AfD7ZBA1Rz8GpjQ== X-Received: by 10.200.66.222 with SMTP id g30mr19103655qtm.184.1519738151422; Tue, 27 Feb 2018 05:29:11 -0800 (PST) Received: from tleilax.poochiereds.net (cpe-2606-A000-1100-DB-0-0-0-C3D.dyn6.twc.com. [2606:a000:1100:db::c3d]) by smtp.gmail.com with ESMTPSA id y124sm2197499qke.40.2018.02.27.05.29.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 27 Feb 2018 05:29:10 -0800 (PST) Message-ID: <1519738149.4300.45.camel@redhat.com> Subject: Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression From: Jeff Layton To: kemi , Ye Xiaolong Cc: David Howells , lkp@01.org, Linus Torvalds , LKML Date: Tue, 27 Feb 2018 08:29:09 -0500 In-Reply-To: <8b48844f-7f9a-a9d7-b5bc-3bc403e0fa78@intel.com> References: <20180225150505.GD7144@yexl-desktop> <1519573271.4702.10.camel@redhat.com> <20180226083807.GE8942@yexl-desktop> <1519645434.4443.15.camel@redhat.com> <1519648433.4443.18.camel@redhat.com> <8b48844f-7f9a-a9d7-b5bc-3bc403e0fa78@intel.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.5 (3.26.5-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2018-02-27 at 15:42 +0800, kemi wrote: > > On 2018年02月26日 20:33, Jeff Layton wrote: > > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote: > > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote: > > > > On 02/25, Jeff Layton wrote: > > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote: > > > > > > Greeting, > > > > > > > > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit: > > > > > > > > > > > > > > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64") > > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > > > in testcase: aim7 > > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory > > > > > > with following parameters: > > > > > > > > > > > > disk: 4BRD_12G > > > > > > md: RAID0 > > > > > > fs: xfs > > > > > > test: disk_src > > > > > > load: 3000 > > > > > > cpufreq_governor: performance > > > > > > > > > > > > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system. > > > > > > test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/ > > > > > > > > > > > > > > > > > > > > > > I'm a bit suspicious of this result. > > > > > > > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to > > > > > inode_eq_iversion{+raw}), and that neither should ever be called from > > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big > > > > > performance hit. > > > > > > > > I tried to queue 4 more times test for both commit c0cef30e4f and its parent, > > > > the result seems quite stable. > > > > > > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e: > > > > "aim7.jobs-per-min": [ > > > > 32964.01, > > > > 32938.68, > > > > 33068.18, > > > > 32886.32, > > > > 32843.72, > > > > 32798.83, > > > > 32898.34, > > > > 32952.55 > > > > ], > > > > > > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089: > > > > "aim7.jobs-per-min": [ > > > > 40239.65, > > > > 40163.33, > > > > 40353.32, > > > > 39976.9, > > > > 40185.75, > > > > 40411.3, > > > > 40213.58, > > > > 39900.69 > > > > ], > > > > > > > > Any other test data you may need? > > > > > > > > > > > > > > Is IMA involved here at all? I didn't see any evidence of it, but the > > > > > kernel config did have it enabled. > > > > > > > > > > > > > Sorry, not quite familiar with IMA, could you tell more about how to check it? > > > > > > > > > > Thanks for retesting it, but I'm at a loss for why we're seeing this: > > > > > > IMA is the the integrity management subsystem. It will use the iversion > > > field to determine whether to remeasure files during remeasurement. It > > > looks like the kernel config has it enabled, but it doesn't look like > > > it's in use, based on the info in the initial report. > > > > > > This patch only affects two inlined functions inode_cmp_iversion and > > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points > > > out). These functions are only called from IMA and fs-specific code > > > (usually in readdir implementations to detect directory changes). > > > > > > XFS does not call either of these functions however, so I'm a little > > > unclear on how this patch could slow anything down on this test. The > > > only thing I can think to do here would be to profile this and see what > > > stands out. > > > > > > Note that we do need to keep this in perspective too. This 18% > > > regression on this test follows around a ~230% improvement that occurred > > > when we merged the bulk of these patches. It's should still be quite a > > > bit faster than the v4.15 in this regard. > > > > > > Still, it'd be good to understand what's going on here. > > > > > > > > > > Could we see the dmesg from this boot? It'd be good to confirm that IMA > > is not involved here, as that's the only place that I can see that would > > call into this code at all here. > > > > See attachment for info on dmesg/perf-profile/compare_result. > Feel free to let Xiaolong or me know if anything else you would like to check. > Many thanks, Only one caller of the functions touched by this patch shows up in the profiles: ima_file_free. That calls ima_check_last_writer, which calls inode_cmp_iversion. The lines from the profiles show: 3da90b159b146672f830bcd2489dd3a1f4e9e089: 0.00% 0.00% [kernel.kallsyms] [k] ima_file_free c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e: 0.01% 0.01% [kernel.kallsyms] [k] ima_file_free Seems pretty insignificant, but perhaps that is somehow accounting for the difference. This is called when a file is freed so there could be an effect I guess if we're closing a lot of files for write. Looking at the disassembly from the builds on my box there is some slight difference, since we did alter the implementation. inode->iversion is 0x150 bytes into the struct on my builds: 3da90b159b146672f830bcd2489dd3a1f4e9e089: 0xffffffff813ae858 <+136>: je 0xffffffff813ae871 0xffffffff813ae85a <+138>: mov 0x150(%rbp),%rsi 0xffffffff813ae861 <+145>: mov 0x20(%rax),%rcx 0xffffffff813ae865 <+149>: and $0xfffffffffffffffe,%rsi 0xffffffff813ae869 <+153>: add %rcx,%rcx 0xffffffff813ae86c <+156>: cmp %rcx,%rsi 0xffffffff813ae86f <+159>: je 0xffffffff813ae899 c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e: 0xffffffff813ae828 <+136>: je 0xffffffff813ae83a 0xffffffff813ae82a <+138>: mov 0x150(%rbp),%rcx 0xffffffff813ae831 <+145>: shr %rcx 0xffffffff813ae834 <+148>: cmp %rcx,0x20(%rax) 0xffffffff813ae838 <+152>: je 0xffffffff813ae862 The patched one looks like it ought to be more efficient. It's certainly fewer instructions and it doesn't touch %rsi now. Are shifts more expensive? In any case, what might be good at this point as a sanity check is to turn off CONFIG_IMA and recheck this. That should tell us whether we're on the right track here. With that disabled, this patch should have no effect on anything at all. Thanks, -- Jeff Layton