Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp3781476ybl; Mon, 27 Jan 2020 10:17:18 -0800 (PST) X-Google-Smtp-Source: APXvYqx4ekE3Ccj/iN3Tsh2YTLF1IQ4dqtr+nts3gXb6I2dOEbMGPQDHh8HxFGjj8ispbiZPAckz X-Received: by 2002:aca:d806:: with SMTP id p6mr237438oig.17.1580149038031; Mon, 27 Jan 2020 10:17:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1580149038; cv=none; d=google.com; s=arc-20160816; b=kJVMqXx5mo9wflJrZc46mUoM59jxp99cKRm3RF/bLxS+qaEN5dLlJhJ3WpDuDOClAe Q7zSVrfb4hpQz7B93rSc8Bjt+nCtJduJJEn+SusvX+U5QJJdBpeORQDPW99GFcSqrQpC v7lYptxZlCnRmdsCzIsBZXWG9TVN5Ni1pBMLV90iq4U8LVDpwRIKNaPuuG7UXdQr4g1N JutLsSIRtrWQyW69UXlVe08JhVpRp0fGc8noRnKh0qyKVWgVRBoS+h/qU+692WhLUsNu pxI4GwLG6B8abYUcM4vr94haBghnG8ITERJUfNz/Msd1eoBYGl9r8jJMalmVHuq2o8LV cpXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=6d8iK1MnMs1MUcRP0ixKsozU/LQP2hJZIf8aV/HUojg=; b=OQ5H6I/ZiR9BoL6fNwH97mdP8evfpKEOsafiExJ/+JtVTEuPSrLUKurApBgPeNCNqo FXNGlkO7kjW8FJGrG67pOBmMTvRv6sH5Zl9qJYLv0Nesuise82WC9GfmBXQxpVG0KqHQ 2/l/3HxIeKKPHUYPegBOVmwG/KQ7OM3IE7HKyPDu3tBCu5QvxLu/PVwWLX7UUb61ywXg VTGj3Gl4/+2d8RaN+lqnWm1Wh+wUjhvo1mC+qTITYN1sCNG9C0Y1LFiuFK20m+Elmsb7 aqI8qi+udiWzAoE2k05FVqr7CFhkW9AuPRvEEly41b+cS28gzeiJEmgUebhfIqYbVyYC fJbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oP2hJQjx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e15si5955873ote.211.2020.01.27.10.17.05; Mon, 27 Jan 2020 10:17:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=oP2hJQjx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726303AbgA0SQK (ORCPT + 99 others); Mon, 27 Jan 2020 13:16:10 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:45089 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725893AbgA0SQK (ORCPT ); Mon, 27 Jan 2020 13:16:10 -0500 Received: by mail-io1-f68.google.com with SMTP id i11so11066652ioi.12; Mon, 27 Jan 2020 10:16:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=6d8iK1MnMs1MUcRP0ixKsozU/LQP2hJZIf8aV/HUojg=; b=oP2hJQjxqq25nVhwP7jAi6G/Pg6hv00lTpQ1Qod1Zpwzk2ZBr6lHHdsdIGlNxeT6pK dkqT4uuUIb/qDfC3JKfi9FHontYPZwz22geG3dCTW8p1KronfgB+1USRWps4p6i5EPwf eOnpD8BD8pISAAxJvlqkKvPVfHT9hN97cu31Kzp5NTTQg2ZExfG03f0uJASJfAMZPSH2 nKzXKJ7C76R6d0CFB0b/twhY7jH6U2XONx8+IaG6oPY3h+Gh+18t/DSutxc9uwTv0CDN zMUgf8uu0C7UKj1RRLk/R4hHD9mynN7acUifqTbqBgcE1C0q3UlxtidmyFvZQLcXmIWU YMuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=6d8iK1MnMs1MUcRP0ixKsozU/LQP2hJZIf8aV/HUojg=; b=QwkRD+16tYnB3OAYWlpMr02O5q4CEotpkDlyx6e3cS0ntt7k6/u0WYA3szsQzIV7ao hb8AC4L1V8P/hzzHGf5gVffwVQ7DtEyBU0jgiBPeHFfEFqsIWQXxakCNObd6hW2XvDj1 jrcbQzVDt/+GOHZJeYxWvS7XMGU5WCuFy4g/uLpbmqRohBuDGAJSw4llrNqDNnMhFfrf jA5N/9rG/8gMXau4+BLvAZCvjSZLaWwWTOTNfyFm4NVBxKtSb/hZwWMetl5DaYR1HPtX w98VXj7fLERRHnCLfVNEiEqrYh0bkETDi1GKQf0PtM9I+Nqvy7QfGfgEao9be2NgwhGO +dbw== X-Gm-Message-State: APjAAAUzmfdV7lmIK1jZgLWZjX6Ws1hLNuBlxpg/JORLFV9Jbq52lVSq z7THyauX425lhT91xC/usXymPDOd8PURseD8/Kw= X-Received: by 2002:a6b:7215:: with SMTP id n21mr13908236ioc.131.1580148969772; Mon, 27 Jan 2020 10:16:09 -0800 (PST) MIME-Version: 1.0 References: <20200127164321.17468-1-lhenriques@suse.com> In-Reply-To: <20200127164321.17468-1-lhenriques@suse.com> From: Ilya Dryomov Date: Mon, 27 Jan 2020 19:16:17 +0100 Message-ID: Subject: Re: [RFC PATCH 0/3] parallel 'copy-from' Ops in copy_file_range To: Luis Henriques Cc: Jeff Layton , Sage Weil , "Yan, Zheng" , Gregory Farnum , Ceph Development , LKML Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 27, 2020 at 5:43 PM Luis Henriques wrote: > > Hi, > > As discussed here[1] I'm sending an RFC patchset that does the > parallelization of the requests sent to the OSDs during a copy_file_range > syscall in CephFS. > > [1] https://lore.kernel.org/lkml/20200108100353.23770-1-lhenriques@suse.com/ > > I've also some performance numbers that I wanted to share. Here's a > description of the very simple tests I've run: > > - create a file with 200 objects in it > * i.e. tests with different object sizes mean different file sizes > - drop all caches and umount the filesystem > - Measure: > * mount filesystem > * full file copy (with copy_file_range) > * umount filesystem > > Tests were repeated several times and the average value was used for > comparison. > > DISCLAIMER: > These numbers are only indicative, and different clusters and client > configs will for sure show different performance! More rigorous tests > would be require to validate these results. > > Having as baseline a full read+write (basically, a copy_file_range > operation within a filesystem mounted without the 'copyfrom' option), > here's some values for different object sizes: > > 8M 4M 1M 65k > read+write 100% 100% 100% 100% > sequential 51% 52% 83% >100% > parallel (throttle=1) 51% 52% 83% >100% > parallel (throttle=0) 17% 17% 83% >100% > > Notes: > > - 'parallel (throttle=0)' was a test where *all* the requests (i.e. 200 > requests to copy the 200 objects in the file) were sent to the OSDs and > the wait for requests completion is done at the end only. > > - 'parallel (throttle=1)' was just a control test, where the wait for > completion is done immediately after a request is sent. It was expected > to be very similar to the non-optimized ('sequential') tests. > > - These tests were executed on a cluster with 40 OSDs, spread across 5 > (bare-metal) nodes. > > - The tests with object size of 65k show that copy_file_range definitely > doesn't scale to files with small object sizes. '> 100%' actually means > more than 10x slower. > > Measuring the mount+copy+umount masks the actual difference between > different throttle values due to the time spent in mount+umount. Thus, > there was no real difference between throttle=0 (send all and wait) and > throttle=20 (send 20, wait, send 20, ...). But here's what I observed > when measuring only the copy operation (4M object size): > > read+write 100% > parallel (throttle=1) 56% > parallel (throttle=5) 23% > parallel (throttle=10) 14% > parallel (throttle=20) 9% > parallel (throttle=5) 5% Was this supposed to be throttle=50? > > Anyway, I'll still need to revisit patch 0003 as it doesn't follow the > suggestion done by Jeff to *not* add another knob to fine-tune the > throttle value -- this patch adds a kernel parameter for a knob that I > wanted to use in my testing to observe different values of this throttle > limit. > > The goal is to probably to drop this patch and do the throttling in patch > 0002. I just need to come up with a decent heuristic. Jeff's suggestion > was to use rsize/wsize, which are set to 64M by default IIRC. Somehow I > feel that it should be related to the number of OSDs in the cluster > instead, but I'm not sure how. And testing these sort of heuristics would > require different clusters, which isn't particularly easy to get. Anyway, > comments are welcome! I agree with Jeff, this throttle is certainly not worth a module parameter (or a mount option). I would start with something like C * (wsize / object size) and pick C between 1 and 4. Thanks, Ilya