Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp2932547imm; Fri, 10 Aug 2018 00:27:32 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwmrdHrWbuDPYVgzbqw7lXK//sOdWWg4LMfWjX5abUnF9Wjg4Pgi/oBCerYbFp+BvaMRQEZ X-Received: by 2002:a62:63c2:: with SMTP id x185-v6mr5743074pfb.13.1533886051996; Fri, 10 Aug 2018 00:27:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533886051; cv=none; d=google.com; s=arc-20160816; b=UYmDLg66ZS+lP4yBa2dAvO6mhvVkFSbv+eMCVxmZL0WjCExxU0rNdp9dS+Ncx/5H3K Kb3UqtN6MWRsZi83rtBEcbz/m2CoXR5wA5irMgdbOPZiMXACz1OMp1Qq6DTuE0+6cBXK C5F4rW5NL1+yGT82EFPDhI5f7mcdAj8tNfJ42jtiuPIDQ77C943BHgkzLhVGYbw7OXLH 8yQpdo06FYbKfZHlhp7ju9nA1u2ZSvPnQIchMqpaPt3LY2BUEFYGHFehbi7MDRvdm4FS 9j/Hq88HJE08xGfxH2Qlp0QAg+c8xIsKEAOZYgYm+h0XObKVRKuZ3027QRHBRMHJrEHu 41Rw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=Vsf4pufoG0R5Jo0fg7VtHaQWcCrOyWm4PwwI4SGQ3m0=; b=wEbEnb74fPhjTVfIaOEE8Wvz5dMraMgB51I6QBJ+VArhW9rOEsvH/0bVkz3IBFaIRk UJOXFQDbqPDm/ls335tBYZ53Lc1CLW2h7a1T5g+CLDgpEGROHIzoe7n5Pz4v9vT4I4ew jYVocKCHF7WlwUvpbwCZJCeVnEnW9XDpzIwKqt+OemQ1L5Icoo/nD3rHaURem0qhbUYR uB5HULMDzcIvsqTeOJEe7zYGjnARE14HL3aBm2qESypfFJ6HaBHGXUPBU7PLLoThg+ny q831YGqxGiAwZPjcocJQj1/0KBtSZbqPSlvQw0/ow/rEo0eQf8074soQ2YtH0T/YzS12 dcIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q30-v6si8708767pgk.253.2018.08.10.00.27.16; Fri, 10 Aug 2018 00:27:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727795AbeHJJyn (ORCPT + 99 others); Fri, 10 Aug 2018 05:54:43 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:33585 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727462AbeHJJym (ORCPT ); Fri, 10 Aug 2018 05:54:42 -0400 Received: from [10.160.4.48] (charybdis-ext.suse.de [195.135.221.2]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Fri, 10 Aug 2018 09:26:04 +0200 Subject: Re: [RFC PATCH 00/17] btrfs zoned block device support To: Naohiro Aota , David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Damien Le Moal , Bart Van Assche , Matias Bjorling References: <20180809180450.5091-1-naota@elisp.net> From: Hannes Reinecke Message-ID: <9a37f119-5e9f-ef98-88a3-45c0f936d9ad@suse.com> Date: Fri, 10 Aug 2018 09:26:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2018 08:04 PM, Naohiro Aota wrote: > This series adds zoned block device support to btrfs. > > A zoned block device consists of a number of zones. Zones are either > conventional and accepting random writes or sequential and requiring that > writes be issued in LBA order from each zone write pointer position. This > patch series ensures that the sequential write constraint of sequential > zones is respected while fundamentally not changing BtrFS block and I/O > management for block stored in conventional zones. > > To achieve this, the default dev extent size of btrfs is changed on zoned > block devices so that dev extents are always aligned to a zone. Allocation > of blocks within a block group is changed so that the allocation is always > sequential from the beginning of the block groups. To do so, an allocation > pointer is added to block groups and used as the allocation hint. The > allocation changes also ensures that block freed below the allocation > pointer are ignored, resulting in sequential block allocation regardless of > the block group usage. > > While the introduction of the allocation pointer ensure that blocks will be > allocated sequentially, I/Os to write out newly allocated blocks may be > issued out of order, causing errors when writing to sequential zones. This > problem s solved by introducing a submit_buffer() function and changes to > the internal I/O scheduler to ensure in-order issuing of write I/Os for > each chunk and corresponding to the block allocation order in the chunk. > > The zones of a chunk are reset to allow reusing of the zone only when the > block group is being freed, that is, when all the extents of the block group > are unused. > > For btrfs volumes composed of multiple zoned disks, restrictions are added > to ensure that all disks have the same zone size. This matches the existing > constraint that all dev extents in a chunk must have the same size. > > It requires zoned block devices to test the patchset. Even if you don't > have zone devices, you can use tcmu-runner [1] to emulate zoned block > devices. It can export emulated zoned block devices via iSCSI. Please see > the README.md of tcmu-runner [2] for howtos to generate a zoned block > device on tcmu-runner. > > [1] https://github.com/open-iscsi/tcmu-runner > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/README.md > > Patch 1 introduces the HMZONED incompatible feature flag to indicate that > the btrfs volume was formatted for use on zoned block devices. > > Patches 2 and 3 implement functions to gather information on the zones of > the device (zones type and write pointer position). > > Patch 4 restrict the possible locations of super blocks to conventional > zones to preserve the existing update in-place mechanism for the super > blocks. > > Patches 5 to 7 disable features which are not compatible with the sequential > write constraints of zoned block devices. This includes fallocate and > direct I/O support. Device replace is also disabled for now. > > Patches 8 and 9 tweak the extent buffer allocation for HMZONED mode to > implement sequential block allocation in block groups and chunks. > > Patches 10 to 12 implement the new submit buffer I/O path to ensure sequential > write I/O delivery to the device zones. > > Patches 13 to 16 modify several parts of btrfs to handle free blocks > without breaking the sequential block allocation and sequential write order > as well as zone reset for unused chunks. > > Finally, patch 17 adds the HMZONED feature to the list of supported > features. > > Naohiro Aota (17): > btrfs: introduce HMZONED feature flag > btrfs: Get zone information of zoned block devices > btrfs: Check and enable HMZONED mode > btrfs: limit super block locations in HMZONED mode > btrfs: disable fallocate in HMZONED mode > btrfs: disable direct IO in HMZONED mode > btrfs: disable device replace in HMZONED mode > btrfs: align extent allocation to zone boundary > btrfs: do sequential allocation on HMZONED drives > btrfs: split btrfs_map_bio() > btrfs: introduce submit buffer > btrfs: expire submit buffer on timeout > btrfs: avoid sync IO prioritization on checksum in HMZONED mode > btrfs: redirty released extent buffers in sequential BGs > btrfs: reset zones of unused block groups > btrfs: wait existing extents before truncating > btrfs: enable to mount HMZONED incompat flag > And unfortunately this series fails to boot for me: BTRFS error (device nvme0n1p5): zoned devices mixed with regular devices BTRFS error (device nvme0n1p5): failed to init hmzoned mode: -22 BTRFS error (device nvme0n1p5): open_ctree failed Needless to say, /dev/nvme0n1p5 is _not_ a zoned device. Nor has the zoned device a btrfs superblock ATM. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)