Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4658252imm; Wed, 30 May 2018 09:28:47 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKapn9wOntAYl8HR1tdHuKFrkEnLQ8LZUJNLQ7lzcN+exw6Qb40/4iuJFg9eesAvXLuANfw X-Received: by 2002:a62:c45a:: with SMTP id y87-v6mr3446693pff.190.1527697727441; Wed, 30 May 2018 09:28:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527697727; cv=none; d=google.com; s=arc-20160816; b=OFZQmG/wpnNXDmpW9alc6T3zvW0HsRtzJAyHSg5w/8dVznhQdS17vegYrwhKwuF36o 9i8SWNYp6xToSi1IOz60U1w9wAXT41Yz9BC3ir8c3IKgmydysWwh6cdViYTsI4V0yuuV KXHePp5xQVGSWR9bKkLAAsOezhn6QqeXC9vCH7KSh/o0I04lLNgn3qABuQSUUbE8t12v EsMkspLoO+WYPYaFTmY8cag4TIQFiGyurtbvqsLY4vBlgWb/npNnn6G/k3gHqDwnBTcQ lqZQ3+Ep0FZfR2tdZR1E/bFDyqTDhbvPtMFDgQZB7yK/f9KetlS3Je7rYeOlzpnTFuCJ pCfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=PB3MjwK72GupI3NPdxDFpmlfRwa4zZTZzVSg89MdAyU=; b=a1uuZxi208Ejroqup/pjpKJPq7mtvot9aNr4vo8HDX7PgT8+lJLhlMUeqhy2h0fwwG Ok2k6/lxMHFvZGTF8AJI4mM2LELSJGyP/1vY9i/8KiLSISio0qNPqUJXeE1mjqJLEd2H xZX3pgbh2L0IQuJqYBmDkHS3Z2tmj71Oztk30WMrnw4No++mUEkQ8fViMM2EfnyYEMly qnh2aeCcmjs0K+sBRG1vQhwUAAbYL4XYrqHC05RhVdLbmaRZ7eisfm1at2A1ha0kkl5W v7eDHZeZhK5y+6SNdQZTBV1K6K4c29I9JXSWeVNJzwD0rKyGeAx2tF4j1BRF1LxJ2Ksd 7SPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=gni/DJ6M; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e26-v6si35322003pfb.185.2018.05.30.09.28.32; Wed, 30 May 2018 09:28:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=gni/DJ6M; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753887AbeE3Q0t (ORCPT + 99 others); Wed, 30 May 2018 12:26:49 -0400 Received: from mail-yb0-f196.google.com ([209.85.213.196]:36063 "EHLO mail-yb0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753534AbeE3Q0d (ORCPT ); Wed, 30 May 2018 12:26:33 -0400 Received: by mail-yb0-f196.google.com with SMTP id o14-v6so6561851ybq.3; Wed, 30 May 2018 09:26:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=PB3MjwK72GupI3NPdxDFpmlfRwa4zZTZzVSg89MdAyU=; b=gni/DJ6MO0xdFBhupuM7j6uo3uEgcTkyyO55HFhIn55MwOCu2JiPXCe++N742K+sFK QGXgyXAUHRmJBa04QQ46BHlVKVr7scQKSFKgWbieRyC3VBVGkGpVteAmycryCd/xrWZN kbK/EjSXiYNsBwI7QLyqzXrz0pDeiOF2m73uU09Uhc9hzcf6lBx2Cv51ROSYRgqMvHy4 7kVB7vwQwu2ax1/q5F6WAUX4Iy3Ue3SQdLPA73iukbBSc1wJcMJivUTROkCHZy8Zn+Cg SP9od+IgbP/f9o8aLPPt8ygcHW7vPen/BdRfPqIsjSgFYd+vlYWknyStmyOFeiV92L2u 8+ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=PB3MjwK72GupI3NPdxDFpmlfRwa4zZTZzVSg89MdAyU=; b=f16CRPKcXZ6YLaUJ52UuVuymU01U1f3Q0pLChbHDIeZM1eL7T59sxWZFzylJAyqnEF QfL3ZzLi9/DxzNEQCU7VFn7cn+o5+x8X4MUS7g/myeqn/HvUza+zUMuZXrOFVsTMibXt 2xbZvuuywCkkJ63DGviwz6bvDwc0q02ejJAtl30QPZcxgoVKYWW9kNhyQsx6i2ZpScIt Z8E3wolhJId6znZ/mvX0NW4bmuHPJpBhyaLuqqVRdST4Vdh7G+pjINGFIvVfqKpeLUhX v811SfOmkSq6gjokeVu+vjLyzvHRqwVypJlTf4RHHb2LquYKmeSreFMqKTzlV761ijhr VW2w== X-Gm-Message-State: ALKqPwdbifiCH5Ng6qa1vVXccQ4PVagNFOLrEAVOApV/mnZAwlZfNYW+ oWUEbeKNh251HKlXpT0Add0= X-Received: by 2002:a25:48d:: with SMTP id 135-v6mr1952697ybe.101.1527697592682; Wed, 30 May 2018 09:26:32 -0700 (PDT) Received: from localhost ([2620:10d:c091:180::1:6002]) by smtp.gmail.com with ESMTPSA id l11-v6sm11965494ywm.80.2018.05.30.09.26.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 May 2018 09:26:31 -0700 (PDT) Date: Wed, 30 May 2018 09:26:29 -0700 From: Tejun Heo To: Josef Bacik Cc: axboe@kernel.dk, kernel-team@fb.com, linux-block@vger.kernel.org, akpm@linux-foundation.org, linux-mm@kvack.org, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Josef Bacik Subject: Re: [PATCH 06/13] blkcg: add generic throttling mechanism Message-ID: <20180530162629.GN1351649@devbig577.frc2.facebook.com> References: <20180529211724.4531-1-josef@toxicpanda.com> <20180529211724.4531-7-josef@toxicpanda.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180529211724.4531-7-josef@toxicpanda.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, On Tue, May 29, 2018 at 05:17:17PM -0400, Josef Bacik wrote: > +static void blkcg_scale_delay(struct blkcg_gq *blkg, u64 now) > +{ > + u64 old = atomic64_read(&blkg->delay_start); > + > + if (old + NSEC_PER_SEC <= now && Maybe time_before64()? > + atomic64_cmpxchg(&blkg->delay_start, old, now) == old) { > + u64 cur = atomic64_read(&blkg->delay_nsec); > + u64 sub = min_t(u64, blkg->last_delay, now - old); > + int cur_use = atomic_read(&blkg->use_delay); > + > + if (cur_use < blkg->last_use) > + sub = max_t(u64, sub, blkg->last_delay >> 1); > + > + /* This shouldn't happen, but handle it anyway. */ > + if (unlikely(cur < sub)) { > + atomic64_set(&blkg->delay_nsec, 0); > + blkg->last_delay = 0; > + } else { > + atomic64_sub(sub, &blkg->delay_nsec); > + blkg->last_delay = cur - sub; > + } > + blkg->last_use = cur_use; Can you please add some comments explaining the above? It's a lot of logic. > +static void blkcg_maybe_throttle_blkg(struct blkcg_gq *blkg, bool use_memdelay) > +{ Maybe add a comment explaining that this is a cold path? > + u64 now = ktime_to_ns(ktime_get()); > + u64 exp; > + u64 delay_nsec = 0; > + int tok; > + > + while (blkg->parent) { > + if (atomic_read(&blkg->use_delay)) { > + blkcg_scale_delay(blkg, now); > + delay_nsec = max_t(u64, delay_nsec, > + atomic64_read(&blkg->delay_nsec)); > + } > + blkg = blkg->parent; > + } Cuz the above may look too much otherwise. ... > +void blkcg_maybe_throttle_current(void) > +{ > + struct request_queue *q = current->throttle_queue; > + struct cgroup_subsys_state *css; > + struct blkcg *blkcg; > + struct blkcg_gq *blkg; > + bool use_memdelay = current->use_memdelay; > + > + if (!q) > + return; The above would be the path taken in most cases, right? > + > + current->throttle_queue = NULL; > + current->use_memdelay = false; So, we only wait once, capped to 1s per blkcg_schedule_throttle()? It'd be great to document the rationales. > + rcu_read_lock(); > + css = kthread_blkcg(); > + if (css) > + blkcg = css_to_blkcg(css); > + else > + blkcg = css_to_blkcg(task_css(current, io_cgrp_id)); > + > + if (!blkcg) > + goto out; > + blkg = blkg_lookup(blkcg, q); > + if (!blkg) > + goto out; > + blkg_get(blkg); I don't think we can do blkg_get() on a blkg which is only protected by rcu. We probably need blkg_tryget() here. > + rcu_read_unlock(); > + blk_put_queue(q); > + > + blkcg_maybe_throttle_blkg(blkg, use_memdelay); > + blkg_put(blkg); > + return; > +out: > + rcu_read_unlock(); > + blk_put_queue(q); > +} > +EXPORT_SYMBOL_GPL(blkcg_maybe_throttle_current); > + > +void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) > +{ > + if (unlikely(current->flags & PF_KTHREAD)) > + return; > + > + if (!blk_get_queue(q)) > + return; > + > + if (current->throttle_queue) > + blk_put_queue(current->throttle_queue); > + current->throttle_queue = q; Can't we set current->throttle_blkg directly? > +static inline int blkcg_unuse_delay(struct blkcg_gq *blkg) > +{ > + int old = atomic_read(&blkg->use_delay); > + > + if (old == 0) > + return 0; > + > + while (old) { > + int cur = atomic_cmpxchg(&blkg->use_delay, old, old - 1); Can we use atomic_dec_return() here? > + if (cur == old) > + break; > + cur = old; > + } > + > + if (old == 0) > + return 0; > + if (old == 1) > + atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); > + return 1; > +} > + > +static inline void blkcg_clear_delay(struct blkcg_gq *blkg) > +{ > + int old = atomic_read(&blkg->use_delay); > + if (!old) > + return; > + if (atomic_cmpxchg(&blkg->use_delay, old, 0) == old) > + atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); atomic_add_unless()? Thanks. -- tejun