Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2917099imm; Fri, 10 Aug 2018 00:07:03 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxxeXG8LLeXdQWthlEhz943bQD+Ix8zWf952dMfhg9KBGa+6LR7WG6voe4zxL2VfPJ65OwF X-Received: by 2002:a63:d10c:: with SMTP id k12-v6mr5257546pgg.49.1533884823144; Fri, 10 Aug 2018 00:07:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533884823; cv=none; d=google.com; s=arc-20160816; b=apomHV3H4mRp6VBkPPiminzwLH8t45tW1ErdsPUjnFTnjoTu7OXcO2G5fGAD6k5WMj 0sBrhsM4GeBxfp1LiPeCpyvSbPitZnx60lToct6k3fvqKTyiDiSdqlXLmFi0DZ2FL01f nFmz+XhAGuQypvn7cLPTN08h1ga13BUTPsEuqtFFpRN2HfwHOpw/6yGMHzXw+xP4MXx8 Wm7mQjo8cXuwdQ/u+2UeBuCz2IsIxsYtn/BZBgrJcdYf3p8VcFF1t2CubVklqMJDgfba WqClykef58Nq5QK1Hqdu2k/y2As4UTsk8Jt3cPb/ARc18+cngde4oRQ1OS3BxwZMiKwT aZ0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=BiU2oEeY9Ub4vgLA/oZ1M5y0Ul7iyNkLt8EvB6X+gsA=; b=IHOPQGFWVxBobGJgkxcEXDoV38rfZuYoGW4ZR70490oArHD+i+BFTnGNohA5U46skC VufSN66It/KnYWSKKu8ZAK8Abjxdmnrn9U22tKhl3XzRWUwW48FsZIRVs3VQJVq/7n7q zyjAPqxg0SW/RA2rSDVYSLaABIR1he8Rg3jmRtGPiY3964eIDjbAmMXnieN6WqYNl+EI oqkk1c9wtplGVJQ6xajyrgWvnjUzFiUqGOQfxTZ4lPlau3kM+EFXC2cu1LrQ9yeQi4E6 myw8tr+3QSqv+5MMFBfvQs0s0t0MBG+De8Yg55hl+JBfbo01NSWoEQY8vsXNJ4dFH94n lQTQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s10-v6si8765137pgh.6.2018.08.10.00.06.47; Fri, 10 Aug 2018 00:07:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727657AbeHJJde (ORCPT + 99 others); Fri, 10 Aug 2018 05:33:34 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:52054 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727542AbeHJJde (ORCPT ); Fri, 10 Aug 2018 05:33:34 -0400 Received: from [10.160.4.48] (nat.nue.novell.com [195.135.221.2]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Fri, 10 Aug 2018 09:05:00 +0200 Subject: Re: [RFC PATCH 00/17] btrfs zoned block device support To: Naohiro Aota , David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Damien Le Moal , Bart Van Assche , Matias Bjorling References: <20180809180450.5091-1-naota@elisp.net> From: Hannes Reinecke Message-ID: Date: Fri, 10 Aug 2018 09:04:59 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2018 08:04 PM, Naohiro Aota wrote: > This series adds zoned block device support to btrfs. > > A zoned block device consists of a number of zones. Zones are either > conventional and accepting random writes or sequential and requiring that > writes be issued in LBA order from each zone write pointer position. This > patch series ensures that the sequential write constraint of sequential > zones is respected while fundamentally not changing BtrFS block and I/O > management for block stored in conventional zones. > > To achieve this, the default dev extent size of btrfs is changed on zoned > block devices so that dev extents are always aligned to a zone. Allocation > of blocks within a block group is changed so that the allocation is always > sequential from the beginning of the block groups. To do so, an allocation > pointer is added to block groups and used as the allocation hint. The > allocation changes also ensures that block freed below the allocation > pointer are ignored, resulting in sequential block allocation regardless of > the block group usage. > > While the introduction of the allocation pointer ensure that blocks will be > allocated sequentially, I/Os to write out newly allocated blocks may be > issued out of order, causing errors when writing to sequential zones. This > problem s solved by introducing a submit_buffer() function and changes to > the internal I/O scheduler to ensure in-order issuing of write I/Os for > each chunk and corresponding to the block allocation order in the chunk. > > The zones of a chunk are reset to allow reusing of the zone only when the > block group is being freed, that is, when all the extents of the block group > are unused. > > For btrfs volumes composed of multiple zoned disks, restrictions are added > to ensure that all disks have the same zone size. This matches the existing > constraint that all dev extents in a chunk must have the same size. > > It requires zoned block devices to test the patchset. Even if you don't > have zone devices, you can use tcmu-runner [1] to emulate zoned block > devices. It can export emulated zoned block devices via iSCSI. Please see > the README.md of tcmu-runner [2] for howtos to generate a zoned block > device on tcmu-runner. > > [1] https://github.com/open-iscsi/tcmu-runner > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/README.md > > Patch 1 introduces the HMZONED incompatible feature flag to indicate that > the btrfs volume was formatted for use on zoned block devices. > > Patches 2 and 3 implement functions to gather information on the zones of > the device (zones type and write pointer position). > > Patch 4 restrict the possible locations of super blocks to conventional > zones to preserve the existing update in-place mechanism for the super > blocks. > > Patches 5 to 7 disable features which are not compatible with the sequential > write constraints of zoned block devices. This includes fallocate and > direct I/O support. Device replace is also disabled for now. > > Patches 8 and 9 tweak the extent buffer allocation for HMZONED mode to > implement sequential block allocation in block groups and chunks. > > Patches 10 to 12 implement the new submit buffer I/O path to ensure sequential > write I/O delivery to the device zones. > > Patches 13 to 16 modify several parts of btrfs to handle free blocks > without breaking the sequential block allocation and sequential write order > as well as zone reset for unused chunks. > > Finally, patch 17 adds the HMZONED feature to the list of supported > features. > Thanks for doing all the work. However, the patches don't apply cleanly to current master branch. Can you please rebase them? Thanks. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)