Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp151955imm; Thu, 30 Aug 2018 18:55:37 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZWtJh5pnJ1eH+gD+lWW5Cy2if9b/4w2Wf8y3A7WdtZVs++I1BVsyj3VBEjiBcpxJx42wlL X-Received: by 2002:a17:902:900c:: with SMTP id a12-v6mr12988046plp.61.1535680537360; Thu, 30 Aug 2018 18:55:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535680537; cv=none; d=google.com; s=arc-20160816; b=Vu7Tdle6DnH3/BTBRj55nCbcbG9VFlj2VDpvJRS3rXLcXZP/tLjFmH/fk+uq5osTFQ nQ+vuREYqAxlZX9D7qDVqWb7+DCypciEyeyqPFOBICbzFFzO0bwveoGvkRPAHGzI6hXo jaggiNr9VVSuNU8d+lRReLrKaPmJkUv+t7WsuqO/Qtxpc5eUwQERVuLRWS1Uea0yYPWo upqMraAWyr6i7KNYOIFqii+h9Cntmay87UmURQ+GQC1gIHA+7UPWSZg0YdN+mPggH/Tq kkK4sJFonT6rD48QdRY4ACcMk68kNSty8cDCoNxguktmzcnNYdUCVc2xfgzSfkB3FvJs XsJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=lTD1Ps/FbavEAiwg4U+GtxM5IKlNXvp/Uhyx42P8vp8=; b=GCik36Eouh7oAZ7Wm9yPWL0ukq6eQq5T0iV4AgvDM0Sjhbk9sG7pJsXbYFTXQ0Hy63 C9offdgpgzwiGgf9rizaVNlF51rRABpzcbOz/E6aE27nboNXcduOwcCh0adTjlZaqGNG f8Wp91rc/PWUZ4/z/Ktlo7pGMuEm6xDng3kk9Qjr/2TO1niVElwrkxqcDAXSNHHlB06J i8SmHB5ryqWgTtY6XatcVE3CRydxVgX/HyEPYviFAGxBMpkJIeShUGRbs2i5elMfaJub FyPCBgWwzdCZL5aBcosYk3+V5r8RBzyv981ZxY88PsYH0J1ddNWZ0hAPXqZY1zyRdD2w SBeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WFzxyVb1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1-v6si7968150plv.28.2018.08.30.18.55.22; Thu, 30 Aug 2018 18:55:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=WFzxyVb1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727273AbeHaF7S (ORCPT + 99 others); Fri, 31 Aug 2018 01:59:18 -0400 Received: from mail-yw1-f65.google.com ([209.85.161.65]:39597 "EHLO mail-yw1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725952AbeHaF7S (ORCPT ); Fri, 31 Aug 2018 01:59:18 -0400 Received: by mail-yw1-f65.google.com with SMTP id m62-v6so4523581ywd.6; Thu, 30 Aug 2018 18:54:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=lTD1Ps/FbavEAiwg4U+GtxM5IKlNXvp/Uhyx42P8vp8=; b=WFzxyVb1cI9gSVLkDAsvn3VSewaZLYOA9M6eezx+fnI6QzkATIFXMgWqRcM3Ae+NYd NIZZH8xFe21mTOpXZTyNxOPFW5G51nME5U9xqIbXzH1ZjLxJ32XzJuv8wSOEax55VcDm u8fESKST2EpewrcDYhjt+mm9TAzH556dyEnuX1oQD6SooDXWI+D0544L/MtZ2TPM8++Y pxxX3aLff73DDXiu7wvLPXx9bPKcdd/1LNSNqxa6Kbbd/5XR1F+tQttnBVB+chWKXYgL XwWNIBD/aJdP6NyA+ZGf7Da4o6ZrwxOj/AhUPeegjjXAwJYqW4KttGaAjeE3bRFTgtK2 6zHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=lTD1Ps/FbavEAiwg4U+GtxM5IKlNXvp/Uhyx42P8vp8=; b=UVwvZbYDnw5S7hXsbja0Helq3w/l7F8l0v6bVpws4EnnmCLRrbhMC+kG9+gNgtF+Rd bQIcmQ4PCBdya1qn3QK6tGxiwIFaf+uU5uLWKzxJ1JpZnMr2QxGMJIwQTgQg6jwLcjm0 p6I2OQLf4p1WweZo4S1JYq7//18jSK4U0C1GFrz2doz2w2r7Pc4PIXNvlda3T5mzVJHX 7thI6bnnQf31EnCwfEsQ1yHeOKm5GX8ikExsIA0S0HiMwUaa47el0yJhKW9W8peQ7x91 7/svWncTPneg23h0LAi1ZFSPocBGbqDxQS9SsZ5pHkWVUR/cnImYXeeGjFMA888Yr6pG zIKw== X-Gm-Message-State: APzg51CQOy1Txu2VDMBhW8Axmy5RZuMTg3fxf7oBJk3Z/dL0ADWKNOFJ 9xedcQHFSvE/Y7jjqGNFpFU= X-Received: by 2002:a0d:f384:: with SMTP id c126-v6mr7483262ywf.10.1535680455594; Thu, 30 Aug 2018 18:54:15 -0700 (PDT) Received: from dennisz-mbp.thefacebook.com ([199.201.65.129]) by smtp.gmail.com with ESMTPSA id j70-v6sm3274084ywb.69.2018.08.30.18.54.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 30 Aug 2018 18:54:14 -0700 (PDT) From: Dennis Zhou To: Jens Axboe , Tejun Heo , Johannes Weiner , Josef Bacik Cc: kernel-team@fb.com, linux-block@vger.kernel.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 00/15] blkcg ref count refactor/cleanup + blkcg avg_lat Date: Thu, 30 Aug 2018 21:53:41 -0400 Message-Id: <20180831015356.69796-1-dennisszhou@gmail.com> X-Mailer: git-send-email 2.13.5 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi everyone, This is a fairly lengthy patchset that aims to cleanup reference counting for blkcgs and blkgs. There are 4 problems that this patchset tries to address: 1. fix blkcg destruction 2. always associate a bio with a blkg 3. remove the extra css ref held by bios and utilize the blkg ref 4. add average latency tracking to blkcg core in io.stat. First, there is a regression in blkcg destruction where references weren't properly put causing blkcgs to never be destroyed. Previously, blkgs were destroyed during offlining of the blkcg. This puts back the blkcg reference a blkg holds allowing blkcg ref to reach zero. Then, blkcg_css_free() is called as part of the final cleanup. To address the first problem, 0001 reverts the broken commit, 0002 delays blkg destruction until writeback has finished, and 0003 closes the window on a race condition between a css migration and dying, and blkg association. This should fix the issue where blkg_get() was getting called when a blkcg had already begun exiting. If a bio finds itself here, it will just fall back to root. Oddly enough at one point, blk-throttle was using policy data from and associating with potentially different blkgs, thus how this was exposed. 0004 also address a similar problem with task_css(current, ...) where association tries to get a css of a task that is migrating with the cgroup dying. Second, both blk-throttle and blk-iolatency rely on blkg association to enable their policies. Rather than each policy (and future policies) implement this logic independently, this consolidates it such that all bios are tagged with a blkg. Third, with the addition of always having a blkg reference, the blkcg can now be referenced through it rather than maintaining an additional pointer and reference. So let's clean this up. Finally, it seems rather useful to know on average how well IOs are doing per cgroup. This adds the average latency statistic to core where it encompasses IOs from all decendants. This patchset contains the following 15 patches: 0001-Revert-blk-throttle-fix-race-between-blkcg_bio_issue.patch 0002-blkcg-delay-blkg-destruction-until-after-writeback-h.patch 0003-blkcg-use-tryget-logic-when-associating-a-blkg-with-.patch 0004-blkcg-fix-ref-count-issue-with-bio_blkcg-using-task_.patch 0005-blkcg-update-blkg_lookup_create-to-do-locking.patch 0006-blkcg-always-associate-a-bio-with-a-blkg.patch 0007-blkcg-consolidate-bio_issue_init-and-blkg-associatio.patch 0008-blkcg-associate-a-blkg-for-pages-being-evicted-by-sw.patch 0009-blkcg-associate-writeback-bios-with-a-blkg.patch 0010-blkcg-remove-bio-bi_css-and-instead-use-bio-bi_blkg.patch 0011-blkcg-remove-additional-reference-to-the-css.patch 0012-blkcg-cleanup-and-make-blk_get_rl-use-blkg_lookup_cr.patch 0013-blkcg-change-blkg-reference-counting-to-use-percpu_r.patch 0014-blkcg-rename-blkg_try_get-to-blkg_tryget.patch 0015-blkcg-add-average-latency-tracking-to-blk-cgroup.patch 0001-0003 addresses the regression in the blkcg cleanup path. 0004 fixes a small window race condition with task_css(current, ...). 0005 is a prepatory patch that cleans up blkg lookup create. 0006-0009 associates all bios with a blkg: regular IO, swap, writeback. 0010 removes the extra css pointer in bios. 0011 removes the implicit reference left behind in 0010. 0012 cleans up blk_get_rl making use of the new blkg ref 0013 changes blkg ref counting from atomic to percpu. 0014 renames and makes blkg_try_get consistent with css_tryget. 0015 adds average latency tracking as part of blk-cgroup. This patchset is on top of axboe#for-4.19/block #b86d865cb1ca. diffstats below: Dennis Zhou (Facebook) (15): Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()" blkcg: delay blkg destruction until after writeback has finished blkcg: use tryget logic when associating a blkg with a bio blkcg: fix ref count issue with bio_blkcg using task_css blkcg: update blkg_lookup_create to do locking blkcg: always associate a bio with a blkg blkcg: consolidate bio_issue_init and blkg association blkcg: associate a blkg for pages being evicted by swap blkcg: associate writeback bios with a blkg blkcg: remove bio->bi_css and instead use bio->bi_blkg blkcg: remove additional reference to the css blkcg: cleanup and make blk_get_rl use blkg_lookup_create blkcg: change blkg reference counting to use percpu_ref blkcg: rename blkg_try_get to blkg_tryget blkcg: add average latency tracking to blk-cgroup Documentation/admin-guide/cgroup-v2.rst | 14 +- block/bio.c | 187 +++++++++++---- block/blk-cgroup.c | 298 +++++++++++++++++------- block/blk-iolatency.c | 26 +-- block/blk-throttle.c | 12 +- block/bounce.c | 4 +- drivers/block/loop.c | 5 +- drivers/md/raid0.c | 2 +- fs/buffer.c | 10 +- fs/ext4/page-io.c | 2 +- include/linux/bio.h | 23 +- include/linux/blk-cgroup.h | 167 ++++++++----- include/linux/blk_types.h | 1 - include/linux/cgroup.h | 2 + include/linux/writeback.h | 5 +- kernel/cgroup/cgroup.c | 4 +- kernel/trace/blktrace.c | 4 +- mm/backing-dev.c | 5 + mm/page_io.c | 2 +- 19 files changed, 520 insertions(+), 253 deletions(-) Thanks, Dennis