Received: by 2002:ac0:8845:0:0:0:0:0 with SMTP id g63csp125972img; Wed, 27 Feb 2019 18:19:22 -0800 (PST) X-Google-Smtp-Source: AHgI3IZuQyzzOT72kv7JxhTZAlOFuGnogq05X4WAWCPOhCmNdxt/dmW/M/D+CXRT6XlQVgmXATtO X-Received: by 2002:a17:902:a50a:: with SMTP id s10mr5314661plq.223.1551320362170; Wed, 27 Feb 2019 18:19:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1551320362; cv=none; d=google.com; s=arc-20160816; b=FF2OjWEr+V4IN5UQc/YesVe5u3lapEmm6FOUdonE389k1sWs+FZLFqQZ/z1P5KCGql HF1WNFG1/DDE8dbdWtyF9phcqD6sG+1WBFRfuzxzke3mYHv67OjI3N33G+iYBt+H9ZXN vG6jfI2739odRfD3JjDGjIXh2OHmitzyE0CDijznQeC3cFDltW+jMukFt+rF8XG1UC7L arCewNoWFLEPnfbjSCyecOoywbp6ySgVgGUsH+SDmQLsC2FcUq7AnQx8ObEnAxYMKAIB 71QjFGWpQ7TjSZAwYmSagrKAV6FK65xj4R9pyNh0kLKicsInpLjhMbK9nZrP39o6PsjZ Q1QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from; bh=AYoc/PM0O7RXeXdYPsnMgGnnG9KkzoC78/ddtavKcnM=; b=D+jOeXbc1i/ofPL98BE2wIUzsglB8R+fnvBz1tDalO5R1uK52cHS3gX5S44jIzU3dR VFF/EIPAGMPKrJJET9lIHIa9zpgGlaV9C6/ehGIItSw58dZbmlyRKXERwBAh5RE0lwtu gCGS5HoxpBGyRsiSYkWnYpGOpfJuk3rLrEwnq2Xryt01+yJu5C1FuWgTQdrzKy4M+aZR I2XLUI2whCcFQb8lUpqDrFkTQ8OBIOnwxGSwyqsvvIxz1WzIN0SomnayGl+g1csAexDd s9Huv+BZC05uIFneHVDKV423avXX+iSSEGQCv+CBI4w16oCsON6bNhQP0aVVL9IkfIXg DnsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b11si17309236plb.427.2019.02.27.18.19.06; Wed, 27 Feb 2019 18:19:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730575AbfB1CSo (ORCPT + 99 others); Wed, 27 Feb 2019 21:18:44 -0500 Received: from mail-qk1-f172.google.com ([209.85.222.172]:32820 "EHLO mail-qk1-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730240AbfB1CSo (ORCPT ); Wed, 27 Feb 2019 21:18:44 -0500 Received: by mail-qk1-f172.google.com with SMTP id x9so11231398qkf.0 for ; Wed, 27 Feb 2019 18:18:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=AYoc/PM0O7RXeXdYPsnMgGnnG9KkzoC78/ddtavKcnM=; b=l/ITi30g7n9hxcvJcq2tjW9RKoKBopo+9LSS0UPzQKVEhMtlwxb+D0zh6XBw63E7/f Mdz/RW+2+f/i4SqfLWmlXEd42ebryZDi3XdKbKztBI7xXSbFER48vH/fxAcVS4Gh+DCI zdrzVJUAXUpjZaOd5RoJML4n1ltVyHhj0//ZLlt+HahgJPmp/B+cGaWCv09zy/uafW+V /zLo29Z+q3teyKXs6iI1K8tErTI3G7sSYVHfrOv+hrl3y17cBd4W4aHOrI+BzxmQFrx+ w60n4J5gJ5B6g+hGiPmFGP5V6HWHNcHx7btKsKYVUR/otw+JEYK4Mbi0cQJoGtuOeUeJ bC4A== X-Gm-Message-State: AHQUAub1+y87FA23HQvhOxWJCyqy3DLo+JcTISXEWxEOs8TyA9GhgYf8 3vyDtbxZNeG2c/GMxQq4JnQ+g1EglE0= X-Received: by 2002:a37:e10e:: with SMTP id c14mr4525046qkm.317.1551320323146; Wed, 27 Feb 2019 18:18:43 -0800 (PST) Received: from localhost.localdomain (cpe-98-13-254-243.nyc.res.rr.com. [98.13.254.243]) by smtp.gmail.com with ESMTPSA id y21sm12048357qth.90.2019.02.27.18.18.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 27 Feb 2019 18:18:42 -0800 (PST) From: Dennis Zhou To: Dennis Zhou , Tejun Heo , Christoph Lameter Cc: Vlad Buslov , kernel-team@fb.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 00/12] introduce percpu block scan_hint Date: Wed, 27 Feb 2019 21:18:27 -0500 Message-Id: <20190228021839.55779-1-dennis@kernel.org> X-Mailer: git-send-email 2.13.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi everyone, It was reported a while [1] that an increase in allocation alignment requirement [2] caused the percpu memory allocator to do significantly more work. After spending quite a bit of time diving into it, it seems the crux was the following: 1) chunk management by free_bytes caused allocations to scan over chunks that could not fit due to fragmentation 2) per block fragmentation required scanning from an early first_free bit causing allocations to repeat work This series introduces a scan_hint for pcpu_block_md and merges the paths used to manage the hints. The scan_hint represents the largest known free area prior to the contig_hint. There are some caveats to this. First, it may not necessarily be the largest area as we do partial updates based on freeing of regions and failed scanning in pcpu_alloc_area(). Second, if contig_hint == scan_hint, then scan_hint_start > contig_hint_start is possible. This is necessary for scan_hint discovery when refreshing the hint of a block. A necessary change is to enforce a block to be the size of a page. This let's the management of nr_empty_pop_pages to be done by breaking and making full contig_hints in the hint update paths. Prior, this was done by piggy backing off of refreshing the chunk contig_hint as it performed a full scan and counting empty full pages. The following are the results found using the workload provided in [3]. branch | time ------------------------ 5.0-rc7 | 69s [2] reverted | 44s scan_hint | 39s The times above represent the approximate average across multiple runs. I tested based on a basic 1M 16-byte allocation pattern with no alignment requirement and times did not differ between 5.0-rc7 and scan_hint. [1] https://lore.kernel.org/netdev/CANn89iKb_vW+LA-91RV=zuAqbNycPFUYW54w_S=KZ3HdcWPw6Q@mail.gmail.com/ [2] https://lore.kernel.org/netdev/20181116154329.247947-1-edumazet@google.com/ [3] https://lore.kernel.org/netdev/vbfzhrj9smb.fsf@mellanox.com/ This patchset contains the following 12 patches: 0001-percpu-update-free-path-with-correct-new-free-region.patch 0002-percpu-do-not-search-past-bitmap-when-allocating-an-.patch 0003-percpu-introduce-helper-to-determine-if-two-regions-.patch 0004-percpu-manage-chunks-based-on-contig_bits-instead-of.patch 0005-percpu-relegate-chunks-unusable-when-failing-small-a.patch 0006-percpu-set-PCPU_BITMAP_BLOCK_SIZE-to-PAGE_SIZE.patch 0007-percpu-add-block-level-scan_hint.patch 0008-percpu-remember-largest-area-skipped-during-allocati.patch 0009-percpu-use-block-scan_hint-to-only-scan-forward.patch 0010-percpu-make-pcpu_block_md-generic.patch 0011-percpu-convert-chunk-hints-to-be-based-on-pcpu_block.patch 0012-percpu-use-chunk-scan_hint-to-skip-some-scanning.patch 0001 fixes an issue where the chunk contig_hint was being updated improperly with the new region's starting offset and possibly differing contig_hint. 0002 fixes possibly scanning pass the end of the bitmap. 0003 introduces a helper to do region overlap comparison. 0004 switches to chunk management by contig_hint rather than free_bytes. 0005 moves chunks that fail to allocate to the empty block list to prevent excess scanning with of chunks with small contig_hints and poor alignment. 0006 introduces the constraint PCPU_BITMAP_BLOCK_SIZE == PAGE_SIZE and modifies nr_empty_pop_pages management to be a part of the hint updates. 0007-0009 introduces percpu block scan_hint. 0010 makes pcpu_block_md generic so chunk hints can be managed as a pcpu_block_md responsible for more bits. 0011-0012 add chunk scan_hints. This patchset is on top of percpu#master a3b22b9f11d9. diffstats below: Dennis Zhou (12): percpu: update free path with correct new free region percpu: do not search past bitmap when allocating an area percpu: introduce helper to determine if two regions overlap percpu: manage chunks based on contig_bits instead of free_bytes percpu: relegate chunks unusable when failing small allocations percpu: set PCPU_BITMAP_BLOCK_SIZE to PAGE_SIZE percpu: add block level scan_hint percpu: remember largest area skipped during allocation percpu: use block scan_hint to only scan forward percpu: make pcpu_block_md generic percpu: convert chunk hints to be based on pcpu_block_md percpu: use chunk scan_hint to skip some scanning include/linux/percpu.h | 12 +- mm/percpu-internal.h | 15 +- mm/percpu-km.c | 2 +- mm/percpu-stats.c | 5 +- mm/percpu.c | 547 +++++++++++++++++++++++++++++------------ 5 files changed, 404 insertions(+), 177 deletions(-) Thanks, Dennis