Received: by 2002:a25:868d:0:0:0:0:0 with SMTP id z13csp2554330ybk; Mon, 18 May 2020 01:58:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw1tOfcVZANbLhepSoLeZZhpWZ5J+g6xm7Yoxep/EXTvr0Uk4V7jg21wp9THCmi0XcuL92E X-Received: by 2002:a17:906:29d3:: with SMTP id y19mr14064201eje.258.1589792338554; Mon, 18 May 2020 01:58:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1589792338; cv=none; d=google.com; s=arc-20160816; b=UxG1rtlqChCU55P/+rQqpu3po48UmQGu7x8VwQBBuFxPJdu/hKQRTSSb1Wr9dMh5Ca i6hh06BxGcvKgrYGuVOR8gg5kk9BLqzVIRhLFNQdMpM3tUchfYMWOHIvbXKFxeB10irQ TaKqP7R996U028+40nwdf5tXWX/PLx9MomO0o6tlMTZLiFELleUyz8LCdH+wQcjixRz4 6UFq9LHCBfVHDC62m52he3SreUkaCA8aV67z8yENm1j99+cbPoxwyqJsyGP65J16yraA vT53KUqWkO4BjL73RyRYXWTGC8/tRLd3BbkS0JsMnzwvHXkUXTYWB9DXSsWLx7NwGll3 BFEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :in-reply-to:date:references:subject:cc:to:from:ironport-sdr :ironport-sdr; bh=fxFxkjln72JJwkD0031zmA93LxbdJZpCbxetBEf0o3U=; b=qEngL9aqvUDunvYAwIufWh7FRVBvpw9AxoBV18Zb5AeSbOSL6hP9OV9qCVhSMgWXj3 Om/ID7zMLsKDr9np7WFMAbH8+CHtmGcjIN/FQeXujA9DWN1LBqY5TF0N89NFOOdseB/f h9kfskvwdexnUyKj7CFGUUZ2V+TZUXkDPfgoXdJNjfqVoD4a6V5T2h9b9HvnuMVDxLGK +nBLewExYzH3mExBC6MF7qddGS5EqsdGLazTl4Q7RvjfL4RPzYMc29n4oMLgoxss75fu FhsPOGvY1Wg2BbrRW38sqSmFOgtLOS/MR09uauv2BuVQZCmiux9BYuopvqZO+jWs3sdr gZSg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d21si6461016eja.353.2020.05.18.01.58.34; Mon, 18 May 2020 01:58:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726428AbgERGhV (ORCPT + 99 others); Mon, 18 May 2020 02:37:21 -0400 Received: from mga04.intel.com ([192.55.52.120]:57252 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726357AbgERGhV (ORCPT ); Mon, 18 May 2020 02:37:21 -0400 IronPort-SDR: 0cDlGCizinvq85BwuErtYEB130F1CbUyOeh74koeqgycWJ6i2GGelcgxgkGrDvxOTcBLUZl9la 0JJWdbzrslFA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 May 2020 23:37:18 -0700 IronPort-SDR: dpCEn2/CEfdq7JLlGLqBIU3s9FKn27G1l+TMNULeeekN2faht2pXRscQjyKsBGWi5quq+fBzGz KVpHNgLokB/g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,406,1583222400"; d="scan'208";a="465658851" Received: from yhuang-dev.sh.intel.com (HELO yhuang-dev) ([10.239.159.23]) by fmsmga006.fm.intel.com with ESMTP; 17 May 2020 23:37:16 -0700 From: "Huang\, Ying" To: Daniel Jordan Cc: Andrew Morton , , , Michal Hocko , Minchan Kim , Tim Chen , Hugh Dickins Subject: Re: [PATCH] swap: Add percpu cluster_next to reduce lock contention on swap cache References: <20200514070424.16017-1-ying.huang@intel.com> <20200515235140.xkznql332xmqvuf2@ca-dmjordan1.us.oracle.com> Date: Mon, 18 May 2020 14:37:15 +0800 In-Reply-To: <20200515235140.xkznql332xmqvuf2@ca-dmjordan1.us.oracle.com> (Daniel Jordan's message of "Fri, 15 May 2020 19:51:40 -0400") Message-ID: <87zha5kauc.fsf@yhuang-dev.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Daniel Jordan writes: > On Thu, May 14, 2020 at 03:04:24PM +0800, Huang Ying wrote: >> And the pmbench score increases 15.9%. > > What metric is that, and how long did you run the benchmark for? I run the benchmark for 1800s. The metric comes from the following output of the pmbench, [1] Benchmark done - took 1800.088 sec for 122910000 page access That is, the throughput is 122910000 / 1800.088 = 68280.0 (accesses/s). Then we sum the values from the different processes. > Given that this thing is probabilistic, did you notice much variance from run > to run? The results looks quite stable for me. The standard deviation of results run to run is less than 1% for me. >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 35be7a7271f4..9f1343b066c1 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -746,7 +746,16 @@ static int scan_swap_map_slots(struct swap_info_struct *si, >> */ >> >> si->flags += SWP_SCANNING; >> - scan_base = offset = si->cluster_next; >> + /* >> + * Use percpu scan base for SSD to reduce lock contention on >> + * cluster and swap cache. For HDD, sequential access is more >> + * important. >> + */ >> + if (si->flags & SWP_SOLIDSTATE) >> + scan_base = this_cpu_read(*si->cluster_next_cpu); >> + else >> + scan_base = si->cluster_next; >> + offset = scan_base; >> >> /* SSD algorithm */ >> if (si->cluster_info) { > > It's just a nit but SWP_SOLIDSTATE and 'if (si->cluster_info)' are two ways to > check the same thing and I'd stick with the one that's already there. Yes. In effect, (si->flags & SWP_SOLIDSTATE) and (si->cluster_info) always has same value at least for now. But I don't think they are exactly same in semantics. So I would rather to use their exact semantics. >> @@ -2962,6 +2979,8 @@ static unsigned long read_swap_header(struct swap_info_struct *p, >> >> p->lowest_bit = 1; >> p->cluster_next = 1; >> + for_each_possible_cpu(i) >> + per_cpu(*p->cluster_next_cpu, i) = 1; > > These are later overwritten if the device is an SSD which seems to be the only > case where these are used, so why have this? Yes. You are right. Will remove this in the future versions. >> p->cluster_nr = 0; >> >> maxpages = max_swapfile_size(); >> @@ -3204,6 +3223,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) >> * SSD >> */ >> p->cluster_next = 1 + prandom_u32_max(p->highest_bit); >> + for_each_possible_cpu(cpu) { >> + per_cpu(*p->cluster_next_cpu, cpu) = >> + 1 + prandom_u32_max(p->highest_bit); >> + } > > Is there a reason for adding one? The history didn't enlighten me about why > cluster_next does it. The first swap slot is the swap partition header, you cand find the corresponding code in syscall swapon function, below comments "Read the swap header.". Best Regards, Huang, Ying