Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp682874imm; Wed, 15 Aug 2018 04:26:39 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwBB29b/1H5hyiM2+OjpeHCSNrtiEBfgMHBW/0ntrXG/E1ugvS4Zrid9G2Y179xqKFEa98f X-Received: by 2002:a17:902:ba88:: with SMTP id k8-v6mr24066292pls.259.1534332399575; Wed, 15 Aug 2018 04:26:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534332399; cv=none; d=google.com; s=arc-20160816; b=DbkDA0exSaroP96bujflHnwojxML8BvppmtN4fqGg7aPbqVieaPUeOhwh5XOz/vNtC rmrUWIhcDRbkjPUf78lq/WJ1cRjjzZVVKnulumwQGuGG2ECqL5a0SpvLU08XtZZ36BrA sZ1QB71FwBt/e/JxhDDG7IzyDZu8cgeyW+8feL8Xu8O3jk9HozF8tcO19ZPbUZ9CYzEY kXpiDxrTEX5b6wLrZwl3YEtfAsdYJCqIJo8mHbs7oQzZLAHcNJJMSkvZxhL1Geqy4skF wkbUhKlL64IzERaLfP7efnrmu8p5wgfHusTJmJ3ijP4Qba86Qb0rp6uT9Ov6YO99tK30 DsYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject:dkim-signature :arc-authentication-results; bh=5SA3mLXAZaUbdpp81/3UadvNg2+8uG9MEHThL1J1JtI=; b=jQetcCjoa+jzFQBiDFF+e0MsIRIOc8Zhjv4TlT89TW0Ph3mwQs9V2vMEenE1Fky2ZN m9BJ7BYMUp3C4a98NGk79camraQdG+PUSB82zpo9SbV7wXFFPPlhCNZciMWnrvMSroCg x+EFeIV549SfCZLpbAkXQlJXmlEY0RWhFIJjV1duE+lynUrkiXQaQFu85wRhZ114IuFV hMMKZzvZC919dGl8972gG/rwQXtodx9I41Rnuj+2Aqr+YAHMfOHhH4HIDlUThPaUPd9g 3Ykr1fOqII8/B8/arexa5ViwTiEOa4eH773Ui/BtqoLV5zuDZClpIfKiZ5gpHc7w86ff IpOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MilgTRXW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b184-v6si24867039pfg.94.2018.08.15.04.26.24; Wed, 15 Aug 2018 04:26:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=MilgTRXW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729185AbeHOORT (ORCPT + 99 others); Wed, 15 Aug 2018 10:17:19 -0400 Received: from mail-qt0-f172.google.com ([209.85.216.172]:44711 "EHLO mail-qt0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727141AbeHOORS (ORCPT ); Wed, 15 Aug 2018 10:17:18 -0400 Received: by mail-qt0-f172.google.com with SMTP id b15-v6so722060qtp.11; Wed, 15 Aug 2018 04:25:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=5SA3mLXAZaUbdpp81/3UadvNg2+8uG9MEHThL1J1JtI=; b=MilgTRXWPstfBpA/dY3BTcFRMtu2WrM70QGUDAJM8mnZgNrqYjbtrb4wXQlWdmfUDc S6RBAhzqjrGDWRj4AadX8uvKVP9GBNj8aiAkqmFt0UDKyhN1MDPeidoL5jY5eip6qJnD sKjhEb0WXBrXddvq50GVxmuCJ0wVnjcID/J1i+cxH0iU9IxleaTskYUXczaD18EEpRQB pTejmv2e2/T07A4SGspTt4f3zCpWfmNRFdAcqUx62Zujp2S9ipAkVt96BS+zgOEgMbym MlcWMrUeR7cYMaLE7v8p0YcXCGEGD4JvHDLiAn7DYZt3Lu51tTG32cmSPskdpgp66/cF f3iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=5SA3mLXAZaUbdpp81/3UadvNg2+8uG9MEHThL1J1JtI=; b=VgyRDRNdu3Xii9VL2CDy12KGJL2oIGULSVYrp4anUQI9qvRNlzhWc0EHdHIRxrKcQR C87EyTbmKTcBXW/iir0BKAP5zJ3oZEa2M1si7jQZhZwqE3sZqXNan4lDL6+xVtOzJCyk oUTOB70MTIK30ZyjPR6pd3u5dbmAF5hTe5i41EPn3QnfikQzR1b1gX7yx/8Blxn/6ByZ /zcOOQVtLGAq+5noEnmoHCx9AAlSMo8/XbZKc+bhkrIkiz6Q9SynquUiMXRLR2vrEV49 D8vTlj9gkp5aQO92SnOkc1ESISxnyunY4uTXopFUdeyCvK5LbSxG6+r/iJbtRR/kVqtT 1HGQ== X-Gm-Message-State: AOUpUlEe/CanmSrPj9L8IE2mcQsiD9wKKJ6BOsvtXwJwhm9JX6jSey4d 8uet4SX9y1Orh46um0dqLiVC39sj X-Received: by 2002:a0c:9d0b:: with SMTP id m11-v6mr22109770qvf.228.1534332330717; Wed, 15 Aug 2018 04:25:30 -0700 (PDT) Received: from [191.9.206.254] (rrcs-70-62-41-24.central.biz.rr.com. [70.62.41.24]) by smtp.gmail.com with ESMTPSA id n24-v6sm17109344qkh.39.2018.08.15.04.25.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Aug 2018 04:25:29 -0700 (PDT) Subject: Re: [RFC PATCH 00/17] btrfs zoned block device support To: Hannes Reinecke , dsterba@suse.cz, Naohiro Aota , David Sterba , linux-btrfs@vger.kernel.org, Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Damien Le Moal , Bart Van Assche , Matias Bjorling References: <20180809180450.5091-1-naota@elisp.net> <20180813184251.GC24025@twin.jikos.cz> <86bddb14-104e-182b-29a1-6ab8150f09a8@suse.com> <057b6600-0fef-4067-54ca-216b591d43f8@gmail.com> <9531d57f-2271-7eb8-b734-dac6d33f0ec1@suse.com> From: "Austin S. Hemmelgarn" Message-ID: <3896e121-0f68-6773-fd3e-921d89756349@gmail.com> Date: Wed, 15 Aug 2018 07:25:27 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <9531d57f-2271-7eb8-b734-dac6d33f0ec1@suse.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018-08-14 03:41, Hannes Reinecke wrote: > On 08/13/2018 09:29 PM, Austin S. Hemmelgarn wrote: >> On 2018-08-13 15:20, Hannes Reinecke wrote: >>> On 08/13/2018 08:42 PM, David Sterba wrote: >>>> On Fri, Aug 10, 2018 at 03:04:33AM +0900, Naohiro Aota wrote: >>>>> This series adds zoned block device support to btrfs. >>>> >>>> Yay, thanks! >>>> > [ .. ] >>>> Device replace is disabled, but the changlog suggests there's a way to >>>> make it work, so it's a matter of implementation. And this should be >>>> implemented at the time of merge. >>>> >>> How would a device replace work in general? >>> While I do understand that device replace is possible with RAID >>> thingies, I somewhat fail to see how could do a device replacement >>> without RAID functionality. >>> Is it even possible? >>> If so, how would it be different from a simple umount? >> Device replace is implemented in largely the same manner as most other >> live data migration tools (for example, LVM2's pvmove command). >> >> In short, when you issue a replace command for a given device, all >> writes that would go to that device are instead sent to the new device. >> While this is happening, old data is copied over from the old device to >> the new one.  Once all the data is copied, the old device is released >> (and it's BTRFS signature wiped), and the new device has it's device ID >> updated to that of the old device. >> >> This is possible largely because of the COW infrastructure, but it's >> implemented in a way that doesn't entirely depend on it (otherwise it >> wouldn't work for NOCOW files). >> >> Handling this on zoned devices is not likely to be easy though, you >> would functionally have to freeze I/O that would hit the device being >> replaced so that you don't accidentally write to a sequential zone out >> of order. > > Ah. Oh. Hmm. > > It would be possible in principle if we freeze accesses to any partially > filled zones on the original device. Then all new writes will be going > into new/empty zones on the new disks, and we can copy over the old data > with no issue at all. > We end up with some partially filled zones on the new disk, but they > really should be cleaned up eventually either by the allocator filling > up the partially filled zones or once garbage collection clears out > stale zones. > > However, I fear the required changes to the btrfs allocator are beyond > my btrfs knowledge :-( The easy short term solution is to just disallow the replace command (with the intent of getting it working in the future), but ensure that the older style add/remove method works. That uses the balance code internally, so it should honor any restrictions on block placement for the new device, and therefore should be pretty easy to get working.