Received: by 2002:a05:6a10:a0d1:0:0:0:0 with SMTP id j17csp2286082pxa; Mon, 24 Aug 2020 09:58:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwj7piuDSW3gI+83Vt4ANjLVh/WGoA8X2fFxblhB1Opc/3AhMHitRinHUL0YWiVayj84/Fa X-Received: by 2002:a50:e846:: with SMTP id k6mr6110425edn.27.1598288317285; Mon, 24 Aug 2020 09:58:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598288317; cv=none; d=google.com; s=arc-20160816; b=d80UR/FuQonjhLY/M41BWlMXerwRkennoymQzox3FwwcaP5wDdUMog6qp1Lh13xuJ5 oHKbcLyuO0CX3+D4p5m74HgorOBU7BZV9fFyRRSUhFAyRY1/kXFidQvIxFba2HsxwgSD FFJWgmnmvSnt1UMe9PbvyLt9eW5lhyxf7ePspSUQL21Q08Cv8L8ne9vjXXldqhQKEbnf 3DNLXp4X0hEsb8WaBauxOWQKJH/MOjIjb9uDfslXN/WHj5Bjgyk/nl2bfwVaHC80zg4h 06EbgGK/W2FSxVEvdU28kbHkt+qDp/I3wU9nGXWescOfJoRlh/eJK8tBb79sUvapz1ZJ a0Nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=uYG7sZBkE/L9UFcOCxW9l6PCLYZrwEyv+NuFN02wSQE=; b=qlSYMWoHKpHc7jNgwEwh8B0UCGw9jm8Jp/Arml3paMOiViE2aZm7C3HT837tzpxryG UiPNToa+YOsuiNR/h1TVC1alO35pa9BUq2X9wTMQGSkcD+oJ2gSq2i8j93+Ji3OGbC/2 FB3drSQDZI2oi74oKwt37Gt3eoOpZlAHf9X22gS/7k8UM/W9aNeTCjdNbax5G4dmbGIk IrrPNY5JyhtYHO/4en/r9hWphp4o4q2go9La/sk8TxfuCdbCvyZ3oML5ZcNoMBQk2g3D 5ubdoJFzBKhmb+pyFeUXzxZT48eqlxADQUP+A9/5fD0xGJBfnnbNZfuyiYjezeGRYvn/ rDEg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y15si6834346edu.119.2020.08.24.09.58.13; Mon, 24 Aug 2020 09:58:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726645AbgHXQ5a (ORCPT + 99 others); Mon, 24 Aug 2020 12:57:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:36390 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726580AbgHXQ5H (ORCPT ); Mon, 24 Aug 2020 12:57:07 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 10B00AD76; Mon, 24 Aug 2020 16:57:25 +0000 (UTC) Date: Mon, 24 Aug 2020 17:56:53 +0100 From: Mel Gorman To: Borislav Petkov Cc: Feng Tang , "Luck, Tony" , kernel test robot , LKML , lkp@lists.01.org Subject: Re: [LKP] Re: [x86/mce] 1de08dccd3: will-it-scale.per_process_ops -14.1% regression Message-ID: <20200824165653.GQ2976@suse.com> References: <20200425114414.GU26573@shao2-debian> <20200425130136.GA28245@zn.tnic> <20200818082943.GA65567@shbuild999.sh.intel.com> <20200818200654.GA21494@agluck-desk2.amr.corp.intel.com> <20200819020437.GA2605@shbuild999.sh.intel.com> <20200821020259.GA90000@shbuild999.sh.intel.com> <20200824151425.GF4794@zn.tnic> <20200824153300.GA56944@shbuild999.sh.intel.com> <20200824161238.GI4794@zn.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20200824161238.GI4794@zn.tnic> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 24, 2020 at 06:12:38PM +0200, Borislav Petkov wrote: > > > :) Right, this is what I'm doing right now. Some test job is queued on > > the test box, and it may needs some iterations of new patch. Hopefully we > > can isolate some specific variable given some luck. > > ... yes, exactly, you need to identify the contention where this > happens, > causing a cacheline to bounce or a variable straddles across a > cacheline boundary, causing the read to fetch two cachelines and thus > causes that slowdown. And then align that var to the beginning of a > cacheline. > Given the test is malloc1, it *may* be struct per_cpu_pages embedded within per_cpu_pageset. The cache characteristics of per_cpu_pageset are terrible because of how it mixes up zone counters and per-cpu lists. However, if the first per_cpu_pageset is cache-aligned then every second per_cpu_pages will be cache-aligned and half of the lists will fit in one cache line. If the whole structure gets pushed out of alignment then all per_cpu_pages straddle cache lines, increase the overall cache footprint and potentially cause problems if the cache is not large enough to hold hot structures. The misses could potentially be inferred without c2c from looking at perf -e cache-misses on a good and bad kernel and seeing if there is a noticable increase in misses in mm/page_alloc.c with a focus on anything using per-cpu lists. Whether the problem is per_cpu_pages or some other structure, it's not struct mce's fault in all likelihood -- it's just the messenger. -- Mel Gorman SUSE Labs