Received: by 2002:ac0:a581:0:0:0:0:0 with SMTP id m1-v6csp443834imm; Mon, 2 Jul 2018 14:40:28 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKDrLa2Z5ZsO/CWajxWtfrtkOSEVAwvjP9JiQfXTPGdH4feYzk3CKjxr1WwxEXCmPfpiUh3 X-Received: by 2002:a63:186:: with SMTP id 128-v6mr22857478pgb.138.1530567628783; Mon, 02 Jul 2018 14:40:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1530567628; cv=none; d=google.com; s=arc-20160816; b=TzyFTMD5jZpeLsu69StMaSIUOImuZAoFV4m+9rsMNMmr4Qs8j8G0fYAWQGjO1DjdAL dmXbp2viWjrAH/QWyR7KSm8MW5EYS0wWaOzFl3KnTPWqrM6GmC/JbPlayEQNwXtpB8HW e66ppkn0c8oy/jnxWSC3gP2Nvd4jJlTzuFekR70b01PO6/uXVuFPLFs9yAYyuGQoFmST 1FAmlEIadBBY2+R9Qywtj4yyy3pIajoVSISusKq8CKvDSzaCat37MWn5NUzGCs5Moljr HN1KXTD15HnDWSP5P6CLnhhOzIIrw1GDKpHj19N+HCAu4Wm6SHzlCCcYL1PUDuETzQ+I 46Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=rNZTdXKNgd4DiZiSOzPVfrG0K73j5tDkUo1zVyqSojY=; b=SE3l4YwvkfhXzvEAfagyWA4tOkP/Rhc/AGB2EF3pq4RhqdwNIbI74QY1xJO99IflHM At5uO2qrfrwHSEgcV8FJFv/xf76ZYcUALEdCm8U8A64IGMUS830e8uNp7m0h+ssApDQk bYyFDCH0g0vp6beivQlVI/UIL15Z1x2Qu6BuZj0Ur7K7z1U6EDqmOEmax4f2DYagdq3g /0wPLh8KgINKN/vIF13mSkFHSIydDuP+z439x9LTQK/Hb6EPs1tcL+BfPL47LcKHuXbB 6bisSLq0Qw/bTsUlFwcnmLJx+FR9FMzsfAz6DU+a0yMrbmXD786KvN9QlFEh2Rr6MEul hA1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tUkUYaHL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1-v6si11899629pgu.605.2018.07.02.14.40.14; Mon, 02 Jul 2018 14:40:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=tUkUYaHL; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753491AbeGBViK (ORCPT + 99 others); Mon, 2 Jul 2018 17:38:10 -0400 Received: from mail-qk0-f195.google.com ([209.85.220.195]:45712 "EHLO mail-qk0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753132AbeGBViH (ORCPT ); Mon, 2 Jul 2018 17:38:07 -0400 Received: by mail-qk0-f195.google.com with SMTP id c192-v6so3244783qkg.12 for ; Mon, 02 Jul 2018 14:38:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=rNZTdXKNgd4DiZiSOzPVfrG0K73j5tDkUo1zVyqSojY=; b=tUkUYaHLKdf7Tsevh6QA6mn++CVR9OY+CjYWAbiLViTlfDBF5fRLdC9o4eRQRFKDj+ x7P/+4xuOVkd+RqPVwsyqjFMZuUQuYYGtch33eCbyAnTrwi9ypGz/+wRevvd0UagOotY BxAV/nTgvGgpd4XXbtNg21glgOXtSIh/A2anmKwrxFIxA44i0Ke1ALIA/8syVCgpTqTA rAVzZJIefdmjVdjwZeo3LFweXZzvjvyTTC+biYxSBmkTfPjHsPJM+hYB+qN2ZYty8awE GurUycujqUkFdIF8s/JSwsxEx4V8PAC9kYdURc7EZigeiOq0T7GkAgu4j5pce4pcByUH 3iaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=rNZTdXKNgd4DiZiSOzPVfrG0K73j5tDkUo1zVyqSojY=; b=BLDpBTsh/AIaQ5mZA73pyWnFyMb+cXH/Nz6b6vDHEV3Db3LT2oLOoqAgnPrnira1Hj TWSY+A+NfUvjqMv6Y7jj3RJS1O/KgHJyPz4mFhvvRJzEJXpI7m54Xn/+CJSiqdi2bvOW fkSRtuIOFez1f/TLlE61YnQH7rdmvcAH98VsQPZq9GNEOz18sMWb5Dl+0uBh5DFkxu2s i7eBIpj0K6HEpK+BhVgeb8K8otV1hGd5nuCBowP/vFAHM1Ew+KNtt2yZxNRe29FUbM5W FfUlnw4pHQQ1+iOG5ItkCozYCc05RnkaXVWbetSt2K0nDALGPo3scTobFR6m/iaugahw 66Ew== X-Gm-Message-State: APt69E35OK/fVjFY5+w9hHx2qFO0lGpem/15U3LT5DX7stOENRSyK6Ah IdYwqw+bwY8fahha41hUREHIkw== X-Received: by 2002:ae9:ef42:: with SMTP id d63-v6mr15033559qkg.99.1530567487138; Mon, 02 Jul 2018 14:38:07 -0700 (PDT) Received: from localhost ([107.15.81.208]) by smtp.gmail.com with ESMTPSA id q15-v6sm6324735qtl.39.2018.07.02.14.38.06 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 02 Jul 2018 14:38:06 -0700 (PDT) Date: Mon, 2 Jul 2018 17:38:05 -0400 From: Josef Bacik To: Andrew Morton Cc: Josef Bacik , axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org, hannes@cmpxchg.org, tj@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH 00/14][V5] Introduce io.latency io controller for cgroups Message-ID: <20180702213804.vcfr4auvunpkd2ky@destiny> References: <20180629192542.26649-1-josef@toxicpanda.com> <20180702142639.752759da566fd9074cf8edfe@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180702142639.752759da566fd9074cf8edfe@linux-foundation.org> User-Agent: NeoMutt/20170714 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jul 02, 2018 at 02:26:39PM -0700, Andrew Morton wrote: > On Fri, 29 Jun 2018 15:25:28 -0400 Josef Bacik wrote: > > > This series adds a latency based io controller for cgroups. It is based on the > > same concept as the writeback throttling code, which is watching the overall > > total latency of IO's in a given window and then adjusting the queue depth of > > the group accordingly. This is meant to be a workload protection controller, so > > whoever has the lowest latency target gets the preferential treatment with no > > thought to fairness or proportionality. It is meant to be work conserving, so > > as long as nobody is missing their latency targets the disk is fair game. > > > > We have been testing this in production for several months now to get the > > behavior right and we are finally at the point that it is working well in all of > > our test cases. With this patch we protect our main workload (the web server) > > and isolate out the system services (chef/yum/etc). This works well in the > > normal case, smoothing out weird request per second (RPS) dips that we would see > > when one of the system services would run and compete for IO resources. This > > also works incredibly well in the runaway task case. > > > > The runaway task usecase is where we have some task that slowly eats up all of > > the memory on the system (think a memory leak). Previously this sort of > > workload would push the box into a swapping/oom death spiral that was only > > recovered by rebooting the box. With this patchset and proper configuration of > > the memory.low and io.latency controllers we're able to survive this test with a > > at most 20% dip in RPS. > > Is this purely useful for spinning disks, or is there some > applicability to SSDs and perhaps other storage devices? Some > discussion on this topic would be useful. > Yes we're using this on SSDs and spinning rust, it would work on all storage devices, you just have to adjust your latency targets accordingly. > Patches 5, 7 & 14 look fine to me - go wild. #14 could do with a > couple of why-we're-doing-this comments, but I say that about > everything ;) > So that one was fun. Our test has the main workload going in the protected group, and all the system specific stuff in an unprotected group and then we run a memory hog in the system group. Obviously this results in everybody dumping all caches first, including pages for the binaries themselves. Then when the applications go to run they incur a page fault, which trips readahead. If we're throttling this means we'll sit in the page fault handler for a good long while. Who cares right? Well apparently the main workload cares, because it talks to some daemon about the current memory on the system so it can make intelligent adjustments on its allocation strategies. The daemon it talks to also gathers a bunch of other statistics, and does things like 'ps' which goes and walks /proc/, which has entries that wait on mmap_sem. So suddenly being block in readahead means we have weird latency spikes because we're holding the mmap_sem the whole time. So instead we want to just skip readahead so we are getting throttled as little as possible while holding our mmap_sem. The inflight READA bio's also need to be aborted, and I have a patch for that as well, but it depends on Jens' READA abort patches that he's still working on, so that part will come after his stuff is ready. Thanks, Josef