Received: by 2002:ac0:950c:0:0:0:0:0 with SMTP id f12csp3271642imc; Wed, 13 Mar 2019 13:20:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqyBiBdtaqXdKzlgeFJQvzCPfvzW+uMXPKVC396L0kisgLZ+kNHGqeocBKovWMohLIeZQKy6 X-Received: by 2002:a17:902:14b:: with SMTP id 69mr42190631plb.216.1552508427822; Wed, 13 Mar 2019 13:20:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552508427; cv=none; d=google.com; s=arc-20160816; b=YqepAJ8m2COSxxAIf8EFgj+Y/7fLGhlAGl9/FqUt9H/+5qg2VrmOlQDAmUg+kjicXu QwqXoojB/pk+07ZyREZilVTeR0yuPFRugKrcYsulighmsJaLIm2aUGiDEEEIr5BCzSVB 1RzBVahs8XkgE93mqL0tLJpMybnbcXvC6zyQohHfcr2u0311E9RxkC1r/vLJKOywkrIE cUmJAkDHUb+B4WakEDF9QPdj7l3fk3qt3mU6j0SkIB8F31G7Zfj4nbXzLSyW0gK+k7/o AQyZaWBKpXR8t7oexY1hbEDzmLA0Dl2uUPY53MMK0E8JdE4l8Q15wVahkEhAIgU8zb8p OYOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=vAX8dogAgzi5qFobW6OjpOevgIDAZa/XIjee6QmCjo4=; b=rQzCoIBLRuX/EWIElkcCpAgyvZg7SY2jhYVsf1stS8vzlpiPzwOxvfLKDZdhrwd9Oc 8cC85r+ifmTS2dSlUVIIWBJD4NXeEMj46X3wU42eawE0A19WKovnWWrmJkoO8a4av78k zyGzU0J8f5Le7+GMPDmHFUwgtSMALo5Bj623tfOv2qfID6+2IZ/JAP/PbHpJuFKLGflj +R+P/SGdDh/Fy3Owljt6zrVeHin34Q6Xq4sz/dp4ogzAQeqZ543LyrLjo4LkWpIrgFb4 E5IXXgEMsBKX5COgLtJRmDB3s78VxwoA9TD0hYVQOmExnkjPr/JbeuFmJzoIBUf2EXja Y6qQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e11si10669947pgk.524.2019.03.13.13.20.11; Wed, 13 Mar 2019 13:20:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727271AbfCMUTo (ORCPT + 99 others); Wed, 13 Mar 2019 16:19:44 -0400 Received: from mail-yw1-f66.google.com ([209.85.161.66]:36108 "EHLO mail-yw1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727048AbfCMUTn (ORCPT ); Wed, 13 Mar 2019 16:19:43 -0400 Received: by mail-yw1-f66.google.com with SMTP id 189so2573283ywi.3 for ; Wed, 13 Mar 2019 13:19:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=vAX8dogAgzi5qFobW6OjpOevgIDAZa/XIjee6QmCjo4=; b=t+zDGDAQVUMOTRUXx523LgUDX/+1zABA2Mh2nZwt2PiMt4yQFGG7iJaH4CkMBCkFyM eUB8fJKcBHxaeP1q0+hknLunHMkobBKBJeVpjUBvEqZRepQH57Typ3cqt15sshEuAYuy dZaOw1hZgFoXHCTuOG97BK6hoS2xAdAXS0COk/kCryvDJ6QAZ+W9MBtjj8AmwQsi16cw t/hxgOwYv7Cn7p7sxKuHXswnImapI3MvY2Lf7u98hAQ322pxMoHbDlwWo4KJIOZIuaFs 2bGXpX9Xu7P3solqsSNdXB+shmkGOe5sTKkgMcP2i4nO3qNSZathPdsAL63pFZZJul+9 lp8A== X-Gm-Message-State: APjAAAVDsk4Nu/pVDFmPmb1/vQwEaABkEkBoWDJUZTXuWU0oY/AhyL+v XKCf961+CijD5K0+z637wcc= X-Received: by 2002:a25:2bc3:: with SMTP id r186mr20242387ybr.292.1552508382572; Wed, 13 Mar 2019 13:19:42 -0700 (PDT) Received: from dennisz-mbp.dhcp.thefacebook.com ([2620:10d:c091:200::3:d743]) by smtp.gmail.com with ESMTPSA id w127sm4379231ywf.97.2019.03.13.13.19.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Mar 2019 13:19:41 -0700 (PDT) Date: Wed, 13 Mar 2019 16:19:39 -0400 From: Dennis Zhou To: Dennis Zhou Cc: Tejun Heo , Christoph Lameter , Vlad Buslov , kernel-team@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/12] introduce percpu block scan_hint Message-ID: <20190313201939.GA60770@dennisz-mbp.dhcp.thefacebook.com> References: <20190228021839.55779-1-dennis@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190228021839.55779-1-dennis@kernel.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 27, 2019 at 09:18:27PM -0500, Dennis Zhou wrote: > Hi everyone, > > It was reported a while [1] that an increase in allocation alignment > requirement [2] caused the percpu memory allocator to do significantly > more work. > > After spending quite a bit of time diving into it, it seems the crux was > the following: > 1) chunk management by free_bytes caused allocations to scan over > chunks that could not fit due to fragmentation > 2) per block fragmentation required scanning from an early first_free > bit causing allocations to repeat work > > This series introduces a scan_hint for pcpu_block_md and merges the > paths used to manage the hints. The scan_hint represents the largest > known free area prior to the contig_hint. There are some caveats to > this. First, it may not necessarily be the largest area as we do partial > updates based on freeing of regions and failed scanning in > pcpu_alloc_area(). Second, if contig_hint == scan_hint, then > scan_hint_start > contig_hint_start is possible. This is necessary > for scan_hint discovery when refreshing the hint of a block. > > A necessary change is to enforce a block to be the size of a page. This > let's the management of nr_empty_pop_pages to be done by breaking and > making full contig_hints in the hint update paths. Prior, this was done > by piggy backing off of refreshing the chunk contig_hint as it performed > a full scan and counting empty full pages. > > The following are the results found using the workload provided in [3]. > > branch | time > ------------------------ > 5.0-rc7 | 69s > [2] reverted | 44s > scan_hint | 39s > > The times above represent the approximate average across multiple runs. > I tested based on a basic 1M 16-byte allocation pattern with no > alignment requirement and times did not differ between 5.0-rc7 and > scan_hint. > > [1] https://lore.kernel.org/netdev/CANn89iKb_vW+LA-91RV=zuAqbNycPFUYW54w_S=KZ3HdcWPw6Q@mail.gmail.com/ > [2] https://lore.kernel.org/netdev/20181116154329.247947-1-edumazet@google.com/ > [3] https://lore.kernel.org/netdev/vbfzhrj9smb.fsf@mellanox.com/ > > This patchset contains the following 12 patches: > 0001-percpu-update-free-path-with-correct-new-free-region.patch > 0002-percpu-do-not-search-past-bitmap-when-allocating-an-.patch > 0003-percpu-introduce-helper-to-determine-if-two-regions-.patch > 0004-percpu-manage-chunks-based-on-contig_bits-instead-of.patch > 0005-percpu-relegate-chunks-unusable-when-failing-small-a.patch > 0006-percpu-set-PCPU_BITMAP_BLOCK_SIZE-to-PAGE_SIZE.patch > 0007-percpu-add-block-level-scan_hint.patch > 0008-percpu-remember-largest-area-skipped-during-allocati.patch > 0009-percpu-use-block-scan_hint-to-only-scan-forward.patch > 0010-percpu-make-pcpu_block_md-generic.patch > 0011-percpu-convert-chunk-hints-to-be-based-on-pcpu_block.patch > 0012-percpu-use-chunk-scan_hint-to-skip-some-scanning.patch > > 0001 fixes an issue where the chunk contig_hint was being updated > improperly with the new region's starting offset and possibly differing > contig_hint. 0002 fixes possibly scanning pass the end of the bitmap. > 0003 introduces a helper to do region overlap comparison. 0004 switches > to chunk management by contig_hint rather than free_bytes. 0005 moves > chunks that fail to allocate to the empty block list to prevent excess > scanning with of chunks with small contig_hints and poor alignment. > 0006 introduces the constraint PCPU_BITMAP_BLOCK_SIZE == PAGE_SIZE and > modifies nr_empty_pop_pages management to be a part of the hint updates. > 0007-0009 introduces percpu block scan_hint. 0010 makes pcpu_block_md > generic so chunk hints can be managed as a pcpu_block_md responsible > for more bits. 0011-0012 add chunk scan_hints. > > This patchset is on top of percpu#master a3b22b9f11d9. > > diffstats below: > > Dennis Zhou (12): > percpu: update free path with correct new free region > percpu: do not search past bitmap when allocating an area > percpu: introduce helper to determine if two regions overlap > percpu: manage chunks based on contig_bits instead of free_bytes > percpu: relegate chunks unusable when failing small allocations > percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE > percpu: add block level scan_hint > percpu: remember largest area skipped during allocation > percpu: use block scan_hint to only scan forward > percpu: make pcpu_block_md generic > percpu: convert chunk hints to be based on pcpu_block_md > percpu: use chunk scan_hint to skip some scanning > > include/linux/percpu.h | 12 +- > mm/percpu-internal.h | 15 +- > mm/percpu-km.c | 2 +- > mm/percpu-stats.c | 5 +- > mm/percpu.c | 547 +++++++++++++++++++++++++++++------------ > 5 files changed, 404 insertions(+), 177 deletions(-) > > Thanks, > Dennis Applied to percpu/for-5.2. Thanks, Dennis