Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp3812675imm; Tue, 29 May 2018 14:18:28 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIwac5a4U7CkKF3BZi5ZyZlXUmz7IwA4jA2R88Lcqr5ntIqDYXXm5ek1bsCrBmppTNy015+ X-Received: by 2002:a17:902:a714:: with SMTP id w20-v6mr69285plq.374.1527628708129; Tue, 29 May 2018 14:18:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527628708; cv=none; d=google.com; s=arc-20160816; b=En8wA23iiTXhnDUGY5gyWez0oj0o6cBWh4o99HkQimzdR+DC+0hCm1HC8QW1L5NZSY xUHi4EMv8cS/tJ8owmeaoNe0q2c2qGoZzJFEahhEZEEUTJCzi8qMG11KSe4O9K7zkyv8 MojGC2huukz9OuohJlP/I6gmKOGPBt5RP/1hpCrmX4Fahvwb/hVhBGYFuT9qsPbe9c5M lMvY9HlwweC4Ci8uFlAK11WswkIr/ScQcGrzrEJa5lrEBVSIK/APSQWnbQJhLdpWzsgM I3S7M//a0CDmvEhVUEJkziqVFVJ3CBLpZ/Xr2OAe3xJAbUEBnJ3nc5LyADNw/0bhtsj7 DZrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:to:from :dkim-signature:arc-authentication-results; bh=+lRVDRql61UfQgWtOQHNGD72Z7EakxrJ7T4GU16vvFg=; b=XZxTR5VDN0JoJ4SBk1yWz3ZsDrlze883yVO1aPsrBQwsKSsajO2QA19ow4SKjMOfJu EFHT4GcvBy1LsHZficE9A8KkjwMISb4aCQvu8E7mOpIfI76GKUhMECgvOvL5Yz6Uytuc WrMpfh+SB3yhTDfXwC4cbyjTqjL0YXDQBt5ZTajgmTFWOScKIX3/uV5HaEWTMKdWomEU DC0mZv8lSH6au4JZMNAyxK3ZQQLsZErqVzhhAqdZWw4K9XJ94ba15YcTP9TiJIk6wdOQ 8wnxmV+eClkGdHOyJQ3W6pq8IyvO7eomsdzl5rYV9NVGIFL22Jb5A1+BdJIGr2lv2LrY utgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=xY0h4ROI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b12-v6si33583464plk.327.2018.05.29.14.18.14; Tue, 29 May 2018 14:18:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=xY0h4ROI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S967076AbeE2VRa (ORCPT + 99 others); Tue, 29 May 2018 17:17:30 -0400 Received: from mail-qk0-f178.google.com ([209.85.220.178]:38812 "EHLO mail-qk0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966847AbeE2VR2 (ORCPT ); Tue, 29 May 2018 17:17:28 -0400 Received: by mail-qk0-f178.google.com with SMTP id c23-v6so12704756qkb.5 for ; Tue, 29 May 2018 14:17:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id; bh=+lRVDRql61UfQgWtOQHNGD72Z7EakxrJ7T4GU16vvFg=; b=xY0h4ROIABx/g/PsrYfkiD3/9VXGA0dsWkwPv7OfNmaEdNY3iUHHFRQCvGNJgsH0Yw nxTPkOTK4qDrQGbvf+t+kpoNk7PHjilnIxLG4dJdh2CgDXUngGYkZGfZWCJbn7nDoDZP GVNatkChBSFESF9D4bDDJSgB540ZC12FOa4IGb0ySKua7myyPITR0gZhbizHE9OZ6WVd lUj42k4ZJQPEzsgcYYbj/I51Mfdqzbj3QkueTB0ML6xlwucw4/oUo+nx5e1sEnQBdu0T QOPXdlnxzZZshDDazIR1MoG90lcElrGuPjq5iV3aZne3/pRRjYsKdkkoPXbyrWrf44kA emYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id; bh=+lRVDRql61UfQgWtOQHNGD72Z7EakxrJ7T4GU16vvFg=; b=YxgMsSJJu+04/FkC9aJMJ+jCwztrgrGLqzE5wflxbsuNE30QtkJLhnF9a/2iRJsKC8 1hD7OfE9PBIm0hOlLLXRR+F6KtVkTpAL7TLJh6OtcbNRF7ARf3SbTrxgpC/LKlJbqtD1 C7YzUXF0sBeWFRdDB8NW0HnNijj0/WWmCxU3lFMCJkBcE7ujCCAhYiex6Ep7cjXBUCXh ITaYzQirufPlxS1AN+lwgO/RLDgj72W6oW3vekg39ds7w03mKk5eCCqbG+VjXuT4GbW9 NO2TgGkM9aOdgHBHJdbCG0J5wFuelBMjeEMOccDQStxManeYScFUYtmw7jDRlYh0s+nE qvEA== X-Gm-Message-State: APt69E3fgUAJqdPzUbYQjqxPW4Jxk85xR0njDigkkpmWLLXGZLsX1I4h z52ZGWaBvyGhfDmbABNvNDJX8g== X-Received: by 2002:a37:5ac5:: with SMTP id o188-v6mr47072qkb.295.1527628647442; Tue, 29 May 2018 14:17:27 -0700 (PDT) Received: from localhost (cpe-2606-A000-4381-1201-225-22FF-FEB3-E51A.dyn6.twc.com. [2606:a000:4381:1201:225:22ff:feb3:e51a]) by smtp.gmail.com with ESMTPSA id d190-v6sm25235871qka.24.2018.05.29.14.17.26 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 29 May 2018 14:17:26 -0700 (PDT) From: Josef Bacik To: axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, tj@kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 00/13] Introdue io.latency io controller for cgroups Date: Tue, 29 May 2018 17:17:11 -0400 Message-Id: <20180529211724.4531-1-josef@toxicpanda.com> X-Mailer: git-send-email 2.14.3 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This series adds a latency based io controller for cgroups. It is based on the same concept as the writeback throttling code, which is watching the overall total latency of IO's in a given window and then adjusting the queue depth of the group accordingly. This is meant to be a workload protection controller, so whoever has the lowest latency target gets the preferential treatment with no thought to fairness or proportionality. It is meant to be work conserving, so as long as nobody is missing their latency targets the disk is fair game. We have been testing this in production for several months now to get the behavior right and we are finally at the point that it is working well in all of our test cases. With this patch we protect our main workload (the web server) and isolate out the system services (chef/yum/etc). This works well in the normal case, smoothing out weird request per second (RPS) dips that we would see when one of the system services would run and compete for IO resources. This also works incredibly well in the runaway task case. The runaway task usecase is where we have some task that slowly eats up all of the memory on the system (think a memory leak). Previously this sort of workload would push the box into a swapping/oom death spiral that was only recovered by rebooting the box. With this patchset and proper configuration of the memory.low and io.latency controllers we're able to survive this test with a at most 20% dip in RPS. There are a lot of extra patches in here to set everything up. The following are just infrastructure that should be relatively uncontroversial [PATCH 01/13] block: add bi_blkg to the bio for cgroups [PATCH 02/13] block: introduce bio_issue_as_root_blkg [PATCH 03/13] blk-cgroup: allow controllers to output their own stats The following simply allow us to tag swap IO and assign the appropriate cgroup to the bio's so we can do the appropriate accounting inside the io controller [PATCH 04/13] blk: introduce REQ_SWAP [PATCH 05/13] swap,blkcg: issue swap io with the appropriate context This is so that we can induce delays. The io controller mostly throttles based on queue depth, however for cases like REQ_SWAP/REQ_META where we cannot throttle without inducing a priority inversion we have a mechanism to "back charge" groups for this IO by inducing an artificial delay at user space return time. [PATCH 06/13] blkcg: add generic throttling mechanism [PATCH 07/13] memcontrol: schedule throttling if we are congested This is more moving things around and refactoring, Jens you may want to pay close attention to this to make sure I didn't break anything. [PATCH 08/13] blk-stat: export helpers for modifying blk_rq_stat [PATCH 09/13] blk-rq-qos: refactor out common elements of blk-wbt [PATCH 10/13] block: remove external dependency on wbt_flags [PATCH 11/13] rq-qos: introduce dio_bio callback And this is the meat of the controller and it's documentation. [PATCH 12/13] block: introduce blk-iolatency io controller [PATCH 13/13] Documentation: add a doc for blk-iolatency Jens, I'm sending this through your tree since it's mostly block related, however there are the two mm related patches, so if somebody from mm could weigh in on how we want to handle those that would be great. Thanks, Josef