Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp2142473pxj; Thu, 20 May 2021 00:45:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyuBHHpuD2bPWJQawZf6MotVlV6SAS4esFNne0bta0gSWgh8Fz58DxDrOU+xV5m3aA0NANm X-Received: by 2002:a17:906:fa0d:: with SMTP id lo13mr3388166ejb.477.1621496748012; Thu, 20 May 2021 00:45:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621496748; cv=none; d=google.com; s=arc-20160816; b=WHtYj8i3rFqSMNGLGRukL0zhVslOttXqsq9wjtB4yPFxqSjKQp3WneXPycriyL/zl5 Rli/SfPvdsIlBchA+d1dFbw5YNB7OdcD33tY9AE7frohUv/61S6rIqUYLIPpQ+ACIuj+ /nWpZnKzbe0Na09+HlRMoztsJ8VokWPW45JkfnxaCYlbbKKWmM/kQ6zKSpW30xkZ6NMX 5/sS3tAjWFFJ7gdNU/DaKn+azontOjeZ2TcwWhpKW6n8wwvJKP4BO2TAcUg9MO18xS7z OGRUq2OREAPM8qca8C2zRyNnrd0jyKUWkhZq1svCD1us6v2RzJLsX2T5l31URMIwTapA bK7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:references:message-id :in-reply-to:subject:cc:to:from:date; bh=+CjUlH9ra3TALLwGn4hATrGDmQeMpdVz+BchJC4iiFA=; b=uXg7MEUKMJVaWJXN2AZIS7GmScbgLzlQ0PBk9kdUMroujwrHVbUErr8TGxZi/1I3o7 v+3JwxCfcRz5fKTnKrWGu3DQpXH4cfKCInGN3LwTZYSGg9Nt2B04TcXoswwjTH5CEshn zuddnUKabDewA6y3QGrEDmRQm6xj9npt2CjU/dHuoGJO44oghpdB7vt/EurBA5RNs3Tb /IRoaPPzmTQqWx3CNFLxl7nOEmTq4TPYjC85UgDhAs5Mqvg/D96hlt6/0tfl4E5dWwsn NtdZSEmGj6wL5OMm2pxHgiEFe3ZKUQejIobw8GyNI9I9u340F0kcYzKh+8HILzPvt7zq wufg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gb26si2536090ejc.3.2021.05.20.00.45.20; Thu, 20 May 2021 00:45:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230460AbhETHoh (ORCPT + 99 others); Thu, 20 May 2021 03:44:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229534AbhETHog (ORCPT ); Thu, 20 May 2021 03:44:36 -0400 Received: from xavier.telenet-ops.be (xavier.telenet-ops.be [IPv6:2a02:1800:120:4::f00:14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17DEAC061574 for ; Thu, 20 May 2021 00:43:13 -0700 (PDT) Received: from ramsan.of.borg ([IPv6:2a02:1810:ac12:ed20:9cc6:7165:bcc2:1e70]) by xavier.telenet-ops.be with bizsmtp id 6vjB2500531btb901vjBWU; Thu, 20 May 2021 09:43:11 +0200 Received: from geert (helo=localhost) by ramsan.of.borg with local-esmtp (Exim 4.93) (envelope-from ) id 1ljdKo-007R7k-PS; Thu, 20 May 2021 09:43:10 +0200 Date: Thu, 20 May 2021 09:43:10 +0200 (CEST) From: Geert Uytterhoeven To: David Sterba cc: linux-btrfs@vger.kernel.org, linux-kernel@vger.kernel.org, Arnd Bergmann Subject: Re: [PATCH] btrfs: scrub: per-device bandwidth control In-Reply-To: <20210518144935.15835-1-dsterba@suse.com> Message-ID: References: <20210518144935.15835-1-dsterba@suse.com> User-Agent: Alpine 2.22 (DEB 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, On Tue, 18 May 2021, David Sterba wrote: > Add sysfs interface to limit io during scrub. We relied on the ionice > interface to do that, eg. the idle class let the system usable while > scrub was running. This has changed when mq-deadline got widespread and > did not implement the scheduling classes. That was a CFQ thing that got > deleted. We've got numerous complaints from users about degraded > performance. > > Currently only BFQ supports that but it's not a common scheduler and we > can't ask everybody to switch to it. > > Alternatively the cgroup io limiting can be used but that also a > non-trivial setup (v2 required, the controller must be enabled on the > system). This can still be used if desired. > > Other ideas that have been explored: piggy-back on ionice (that is set > per-process and is accessible) and interpret the class and classdata as > bandwidth limits, but this does not have enough flexibility as there are > only 8 allowed and we'd have to map fixed limits to each value. Also > adjusting the value would need to lookup the process that currently runs > scrub on the given device, and the value is not sticky so would have to > be adjusted each time scrub runs. > > Running out of options, sysfs does not look that bad: > > - it's accessible from scripts, or udev rules > - the name is similar to what MD-RAID has > (/proc/sys/dev/raid/speed_limit_max or /sys/block/mdX/md/sync_speed_max) > - the value is sticky at least for filesystem mount time > - adjusting the value has immediate effect > - sysfs is available in constrained environments (eg. system rescue) > - the limit also applies to device replace > > Sysfs: > > - raw value is in bytes > - values written to the file accept suffixes like K, M > - file is in the per-device directory /sys/fs/btrfs/FSID/devinfo/DEVID/scrub_speed_max > - 0 means use default priority of IO > > The scheduler is a simple deadline one and the accuracy is up to nearest > 128K. > > Signed-off-by: David Sterba Thanks for your patch, which is now commit b4a9f4bee31449bc ("btrfs: scrub: per-device bandwidth control") in linux-next. noreply@ellerman.id.au reported the following failures for e.g. m68k/defconfig: ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined! ERROR: modpost: "__divdi3" [fs/btrfs/btrfs.ko] undefined! > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -1988,6 +1993,60 @@ static void scrub_page_put(struct scrub_page *spage) > } > } > > +/* > + * Throttling of IO submission, bandwidth-limit based, the timeslice is 1 > + * second. Limit can be set via /sys/fs/UUID/devinfo/devid/scrub_speed_max. > + */ > +static void scrub_throttle(struct scrub_ctx *sctx) > +{ > + const int time_slice = 1000; > + struct scrub_bio *sbio; > + struct btrfs_device *device; > + s64 delta; > + ktime_t now; > + u32 div; > + u64 bwlimit; > + > + sbio = sctx->bios[sctx->curr]; > + device = sbio->dev; > + bwlimit = READ_ONCE(device->scrub_speed_max); > + if (bwlimit == 0) > + return; > + > + /* > + * Slice is divided into intervals when the IO is submitted, adjust by > + * bwlimit and maximum of 64 intervals. > + */ > + div = max_t(u32, 1, (u32)(bwlimit / (16 * 1024 * 1024))); > + div = min_t(u32, 64, div); > + > + /* Start new epoch, set deadline */ > + now = ktime_get(); > + if (sctx->throttle_deadline == 0) { > + sctx->throttle_deadline = ktime_add_ms(now, time_slice / div); ERROR: modpost: "__udivdi3" [fs/btrfs/btrfs.ko] undefined! div_u64(bwlimit, div) > + sctx->throttle_sent = 0; > + } > + > + /* Still in the time to send? */ > + if (ktime_before(now, sctx->throttle_deadline)) { > + /* If current bio is within the limit, send it */ > + sctx->throttle_sent += sbio->bio->bi_iter.bi_size; > + if (sctx->throttle_sent <= bwlimit / div) > + return; > + > + /* We're over the limit, sleep until the rest of the slice */ > + delta = ktime_ms_delta(sctx->throttle_deadline, now); > + } else { > + /* New request after deadline, start new epoch */ > + delta = 0; > + } > + > + if (delta) > + schedule_timeout_interruptible(delta * HZ / 1000); ERROR: modpost: "__divdi3" [fs/btrfs/btrfs.ko] undefined! I'm a bit surprised gcc doesn't emit code for the division by the constant 1000, but emits a call to __divdi3(). So this has to become div_u64(), too. > + /* Next call will start the deadline period */ > + sctx->throttle_deadline = 0; > +} BTW, any chance you can start adding lore Link: tags to your commits, to make it easier to find the email thread to reply to when reporting a regression? Thanks! Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds