Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp4166164ybx; Mon, 4 Nov 2019 08:50:53 -0800 (PST) X-Google-Smtp-Source: APXvYqw0sFcBfTRhN0Fr0tSn2pYU+v6YZBow1EREs1wi0TRENmLalFxFPkXkjzl2oEmaDxof33cU X-Received: by 2002:a17:906:1fd5:: with SMTP id e21mr3701786ejt.320.1572886253353; Mon, 04 Nov 2019 08:50:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1572886253; cv=none; d=google.com; s=arc-20160816; b=qQCVZJldCBNOwOXU+c2hs16d9j6OZCAxa7jbP3imnsmdboLSrbeWYojTz+61NzFvRH SVd7zrxA6yk/OwKvhCmD797pcOTmOieKIlSJHjrMtZP/CIB/lZGQa5NhE9KZxGwN60ot FfvlFFTMGG/tykM0m6WcjK6ucl4gCrpJj3TnyJg3ClFwvAyJ2R/AFGhtXOHdQHMuHMvI /XSg93hSyi1mylM6iXCLgz1Ab7x3BJI+hwTfk2BdyQdrbS5lUTFUg2NaJ8DGPoyDluHm idi5s5u2hzlM7ZcY+qdjuUAZSbcYxupIKbPoors2frlB04BSs6DUrcve+SYigKjM2LjB w05w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=2GCXse3i8dZWY3iqcdbG+HD0y8Pc0P0D/i6paZcwU8A=; b=1Kq9UnUDnPk1zo5C13BKaa6Fl4RYa5xuT8QP3e7b/B57NFG16AJ6G2Wlmld2RF+vcL eZsOz7TherVUIuwC3dXwLhqNZy38ZUiEXFJONlouQxlCmlDbOL5Vgw1fO28IxvdkDAVz xn2RFnhIfDdO9FJ4mxwCx6yjIDhG+GKuIodV6uohC4dBb5gTUQd9myOHnbyTDgJ6w9mz MmsGe8yJx5+RgU8u0cnbOiWaGpC1fX0qLogPvaiVU2YQ6bv0pwfPYiUUn3B6f/UK6Jne ZFHkwe2xU9AK3JnpOVYxiUCit1XyI7ahhDunRHapiOnOo9Q9dT+K7GKq7eHlnFR5DnpT JCeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@android.com header.s=20161025 header.b=uU7Krz5F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=android.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id um11si11469635ejb.263.2019.11.04.08.50.30; Mon, 04 Nov 2019 08:50:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@android.com header.s=20161025 header.b=uU7Krz5F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=android.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728321AbfKDQtG (ORCPT + 99 others); Mon, 4 Nov 2019 11:49:06 -0500 Received: from mail-wm1-f67.google.com ([209.85.128.67]:53764 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727861AbfKDQtG (ORCPT ); Mon, 4 Nov 2019 11:49:06 -0500 Received: by mail-wm1-f67.google.com with SMTP id x4so6301233wmi.3 for ; Mon, 04 Nov 2019 08:49:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=android.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=2GCXse3i8dZWY3iqcdbG+HD0y8Pc0P0D/i6paZcwU8A=; b=uU7Krz5FyeHVlvMbtIzCRn0+S2fQ4+7O8QJiCjuVTYJQPBx2WQGcY2NLBUeM9wn25R ROpekZAVJdynYiLoqnmy6yth/tVdTozzYI4O4kTuodyCLkwBH9rPqN91EyaJ8P2FjEB6 vTrulmhp2Lv5osMWzTLwC8qhEylr5fJ4Q/07FsjB64uDW4p4XsD76qDv3gUYh0THoJu4 lzIebTQu7vHAsZYYIOFukkOEqsGynlPelbBzIPg7q4kh9EW5trD2FL5HHV28FBzaLLyq puDlYKiOT9yIxGVaAhZ+MmOApLMavJgW7KoyQrdoqfE9LJtu3YuNqKaPug9LBt2XYaFj SBvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=2GCXse3i8dZWY3iqcdbG+HD0y8Pc0P0D/i6paZcwU8A=; b=LCpZcQ5z7wtCjRJ/PQSKFL7vpad3tvw1uUtxLZS6Dv/cbcaKOF6uQcPkKqAukYqCBO fp5K1Oplt0ZZHa7Ng3zvc7PeDPoszGXH4TWJLBSA+eUd5BLvptyqKOHyELhisxpURP0M AgOrnLOhci5QkhqajcPyZYWHyKGW1kt4htfFyoUOWtRBcKDM+REWFbUJSqnt2xlPCnQh Nz1b8BZRbHpW6QuRByuaGJu3cxaqNr/hpiCGMZYJvQ9RR2F7hWbIlXFx/C20h82t3bIp zzHxZ6aXOI6uZxSwAMi5ZGL2AlaXzwuW7urmhGQ2px8/RbLZB6o6Eg+E8PKVoU1ILqjS ffKg== X-Gm-Message-State: APjAAAWnqOZjz84aUHsqEb04tcPVQ9Z0ZnXby5+VYSJJFCWtpUTHuh+u kzHveDMZSxbPQXo8bAoXJc5jEg== X-Received: by 2002:a1c:60d7:: with SMTP id u206mr15589wmb.101.1572886143360; Mon, 04 Nov 2019 08:49:03 -0800 (PST) Received: from google.com ([2a00:79e0:d:210:e751:37a0:1e95:e65d]) by smtp.gmail.com with ESMTPSA id x205sm23003638wmb.5.2019.11.04.08.49.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Nov 2019 08:49:02 -0800 (PST) Date: Mon, 4 Nov 2019 16:49:00 +0000 From: Alessio Balsini To: Mikulas Patocka Cc: Jens Axboe , Alasdair G Kergon , elsk@google.com, dvander@google.com, dm-devel@redhat.com, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com Subject: Re: dm-snapshot for system updates in Android Message-ID: <20191104164900.GA10934@google.com> References: <20191025101624.GA61225@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Mikulas, Thank you for your answer and suggestions. On Tue, Oct 29, 2019 at 10:21:14AM -0400, Mikulas Patocka wrote: > Hi > > On Fri, 25 Oct 2019, Alessio Balsini wrote: > > > Hello everyone! > > > > I hope you will appreciate knowing that we are currently evaluating the use of > > dm-snapshot to implement a mechanism to obtain revertible, space-efficient > > system upgrades in Android. More specifically, we are using > > dm-snapshot-persistent to test the updated device after reboot, then issue a > > merge in case of success, otherwise, destroy the snapshot. > > This new update mechanism is still under evaluation, but its development is > > openly done in AOSP. > > > > At the current stage, we have a prototype we are happy with, both in terms of > > space consumption overhead (for the COW device) and benchmarking results for > > read-write and merge operations. > > > > I would be glad if you could provide some feedback on a few points that I don't > > have completely clear. > > > > > > -- Interface stability > > > > To obtain an initial, empty COW device as quick as possible, we force to 0 only > > its first 32 bit (magic field). This solution looks clear from the kernel code, > > but can we rely on that for all the kernels with SNAPSHOT_DISK_VERSION == 1? > > It will work, but, to be consistent with lvm, I suggest to overwrite the > first 4k with zeroes. > > > Would you appreciate it if a similar statement is added as part of > > /Documentation, making this solution more stable? Or maybe I can think of > > adding an initialization flag to the dm-snapshot table to explicitly request > > the COW initialization within the kernel? > > > > Another issue we are facing is to be able to know in advance what the minimum > > COW device size would be for a given update to be able to allocate the right > > This is hard to say, it depends on what the user is doing with the phone. > When dm-snapshot runs out of space, it invalidates the whole snapshot. > You'll have to monitor the snapshot space very carefully and take action > before it fills up. I forgot to mention that all the partitions we are updating are read-only, and can only be modified by snapshot-merge. This allows us to establish a direct relation between the required COW device size and the operations performed by the update (i.e. the number of chunks that are going to be modified). > > I suggest - run main system on the origin target and attach a snapshot > that will be used for backup of the data overwritten in the origin. If the > updated system fails, merge the snapshot back into the origin; if the > update succeeds, drop the snapshot. If the user writes too much data to > the device, it would invalidate the only the snapshot (so he can't revert > anymore), but it would not invalidate the origin and the data would not be > lost. This is an approach we evaluated, but the main reason why we decided for the solution of updating the snapshot and then merging it to the base device is that we want to be sure that the update was successful before permanently change to the base device. For example, if for some reason the update is interrupted, it would be more difficult to roll-back or restore the update. Additionally, if the update wants to resize the partitions, this operation could not be done until reboot. > > > size for the COW device in advance. To do so, we rely on the current COW > > structure that seems to have kept the same stable shape in the last decade, and > > compute the total COW size by knowing the number of modified chunks. The > > formula would be something like that: > > > > table_line_bytes = 64 * 2 / 8; > > exceptions_per_chunk = chunk_size_bytes / table_line_bytes; > > total_cow_size_chunks = 1 + 1 + modified_chunks > > + modified_chunks / exceptions_per_chunk; > > > > This formula seems to be valid for all the recent kernels we checked. Again, > > can we assume it to be valid for all the kernels for which > > SNAPSHOT_DISK_VERSION == 1? > > Yes, we don't plan to change it. > > > -- Alignment > > > > Our approach follows the solution proposed by Mikulas [1]. > > Being the block alignment of file extents automatically managed by the > > filesystem, using FIEMAP should have no alignment-related performance issue. > > But in our implementation we hit a misalignment [2] branch which leads to > > dmwarning messages [3, 4]. > > > > I have a limited experience with the block layer and dm, so I'm still > > struggling in finding the root cause for this, either in user space or kernel > > space. > > I don't know. What is the block size of the filesystem? Are all mappings > aligned to this block size? Here follows a just generated warning coming from a Pixel 4 kernel (4.14): [ 3093.443808] device-mapper: table: 253:16: adding target device dm-15 caused an alignment inconsistency: physical_block_size=4096, logical_block_size=4096, alignment_offset=61440, start=0 Does this contain all the info you asked for? I started investigating this issue, but since we didn't notice any performance degradation, I prioritized other things. I'll be hopefully able to get back to this warning in the next months. Please let me know if I can help you with that or if you need additional information. > > > But our benchmarks seems to be good, so we were thinking as last option to > > rate-limit or directly remove that warning from our kernels as a temporary > > solution, but we prefer to avoid diverging from mainline. Rate-limiting is a > > solution that would make sense also to be proposed in the list, but completely > > removing the warning doesn't seem the right thing to do. Maybe we are > > benchmarking something else? What do you think? > > > > Many thanks for taking the time to read this, feedbacks would be highly > > appreciated. > > > > Regards. > > Alessio > > > > [1] https://www.redhat.com/archives/dm-devel/2018-October/msg00363.html > > [2] https://elixir.bootlin.com/linux/v5.3/source/block/blk-settings.c#L540 > > [3] https://elixir.bootlin.com/linux/v5.3/source/drivers/md/dm-table.c#L484 > > [4] https://elixir.bootlin.com/linux/v5.3/source/drivers/md/dm-table.c#L1558 > > Mikulas > Thanks again, Alessio