Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp951023pxb; Thu, 25 Feb 2021 21:18:41 -0800 (PST) X-Google-Smtp-Source: ABdhPJw2V5ZuWgQ4fyG/BufImO6BmeRCEQg481cMRLMtkOOw439i1DlM9+lR+oaFal0KJ/bdxX/7 X-Received: by 2002:a17:907:3e04:: with SMTP id hp4mr1441592ejc.188.1614316721467; Thu, 25 Feb 2021 21:18:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614316721; cv=none; d=google.com; s=arc-20160816; b=RQsTOs1aPat6NQ5HubSKQaJ8f6ncRchYDJ75akAs2iTRyx5jeTgclXJm3A4C5QeIR3 cBAX2eh+ri0ptc/KdbBmnFvEO8ylTJ4SR/Qh8X7VIzBJ9KHdwXIENrzeZXQrLHoJfW3A etbbe9qvLH1ox0RDwDMWRnKYBovwp0YpFtrFxLqhYZHUUrO1xbbekxFmjd1rPQMZKYnn 2bFJGwyEVz2X3BL+1uHG8m650KXxJtNv1UgxtkKrC0W2euIQVCu+kJK6HkEM+OQMB+tL 5F5Il8EKbHrMcFcSb2L7KQ9puEgmbwgRZmUpScO0uaVUbb4VRaXBOnmu4beAsbhNGbpf Lw/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:to:cc:in-reply-to:date:subject :mime-version:message-id:from:dkim-signature; bh=XP1tl4Hv8qsKy1mV1NKF6MqYxY1y0R3nrC5SHGdIwNk=; b=GiEeaMxetUiuFsr95jV/qyh/lNk4kNLhTqHrAP/xIZqKtIe3p9ASOrL2eVEbcjAIQa aGeCtCoSPS8hNXF6ZwqWEKz6C+xCzlHiE3wEC4AaimxN5XKfyBXQhreudU65kzVF7fY9 DKk5IXCpTUBvj83Sts8igOfS6aax9QOSWlHVUWVXpfLp5qzLLd1CfhTrNFGwd4iEzWgY +7kAERxo0mDrg86ptxQjRVWDjbeBSPbs7sOZJvq7uh3NkFSRiU6HPR2MnaDKx9VZen3H x8ACC8umH1yn6QVW1PVdrpBB7jAvgZVxXDAH9I60zFNYEWLfVfHs/BOj7MLV/1A7roh1 UI7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=vd5aFiZG; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mp39si5186229ejc.596.2021.02.25.21.18.07; Thu, 25 Feb 2021 21:18:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@dilger-ca.20150623.gappssmtp.com header.s=20150623 header.b=vd5aFiZG; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229745AbhBZFO7 (ORCPT + 99 others); Fri, 26 Feb 2021 00:14:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229590AbhBZFOw (ORCPT ); Fri, 26 Feb 2021 00:14:52 -0500 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 45886C061574 for ; Thu, 25 Feb 2021 21:14:12 -0800 (PST) Received: by mail-pg1-x533.google.com with SMTP id o38so5522119pgm.9 for ; Thu, 25 Feb 2021 21:14:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=XP1tl4Hv8qsKy1mV1NKF6MqYxY1y0R3nrC5SHGdIwNk=; b=vd5aFiZGXu9xC6o/pdUMZanASQ7lgwwFDUIxiH2EghzmDgmfIf6nW16AiDRpv5tIC1 IV748zScdPdWY8xfrUta0lyD2CgK6GFNEZeepqHwZgdFcPBrp63B0CHAomMjdIvA9CSV GK+cEhORkLkn30rJsCHQSdbHI4Sm1SWogtvE9Rapd3IKRaVDBNU/urqcK2dpQU9Wqaek jApxcVU/F/2vgqXfqiTvXqpzBbl/HWvrvt3i8G0Lok4fej38PU+msNJ1huBpnixCthqm YIXPCnD2OqNrb3sXAD1ji2/vi6M6r8TdpcUA9vAt+oXm0ZQHWoBQPapj533Z1zFJIlVu AYBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=XP1tl4Hv8qsKy1mV1NKF6MqYxY1y0R3nrC5SHGdIwNk=; b=ONl6EiloF+DjnD2VdENHHjpUZE8ibGt8ivkaep38fMi3K1x271bz1qhiniuATTfpPQ CZY6ryhv2DmZEJQLr71NDV+BWhwXB70g/3Zy6xOYJ6xM5xgYdSplxfU6HASsOuWzs8ci AKp92J35JVE1cES5pMnHCGzKk9o1ANdpAlzZO8N33IK4o/JmNYKvJWZljW5+uxtKs/9/ mDs0zoLMz+knTEsdoDbw4INBZJ/y9gFwwsCANAAF5VsuMMg1jC+fMnxYhpf4h+VuIVmb sl934TyErPMMlWwrNumvVZqHMmyXVz8lWloItwjUNb2gml1NhkiGG4wdXNzP66vicpuf MthA== X-Gm-Message-State: AOAM532LlBR+7QC0clCpWSwWNVO42ONZVaBHra+MXwstxT9XzCoAnk1j zUTNQhORacdBKlcV53G7tvAmHA== X-Received: by 2002:a62:7bc5:0:b029:1ed:62d5:31f7 with SMTP id w188-20020a627bc50000b02901ed62d531f7mr1559815pfc.24.1614316451509; Thu, 25 Feb 2021 21:14:11 -0800 (PST) Received: from cabot.adilger.int (S01061cabc081bf83.cg.shawcable.net. [70.77.221.9]) by smtp.gmail.com with ESMTPSA id w187sm7818137pgb.52.2021.02.25.21.14.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Feb 2021 21:14:10 -0800 (PST) From: Andreas Dilger Message-Id: <24A7BB91-16E7-4C9D-BD80-0B75927AF7B0@dilger.ca> Content-Type: multipart/signed; boundary="Apple-Mail=_454B88CC-A469-4B83-B785-9645C1B288C0"; protocol="application/pgp-signature"; micalg=pgp-sha256 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH 1/2] ext4: Handle casefolding with encryption Date: Thu, 25 Feb 2021 22:14:07 -0700 In-Reply-To: Cc: Theodore Ts'o , Eric Biggers , Ext4 Developers List , Linux Kernel Mailing List , linux-fsdevel , Gabriel Krisman Bertazi , kernel-team@android.com, Paul Lawrence To: Daniel Rosenberg References: <20210203090745.4103054-2-drosen@google.com> <56BC7E2D-A303-45AE-93B6-D8921189F604@dilger.ca> <42511E9D-3786-4E70-B6BE-D7CB8F524912@dilger.ca> <01918C7B-9D9B-4BD8-8ED1-BA1CBF53CA95@dilger.ca> X-Mailer: Apple Mail (2.3273) Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org --Apple-Mail=_454B88CC-A469-4B83-B785-9645C1B288C0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii On Feb 18, 2021, at 4:21 PM, Daniel Rosenberg wrote: > > On Wed, Feb 17, 2021 at 2:48 PM Andreas Dilger wrote: >> >> On Feb 17, 2021, at 9:08 AM, Theodore Ts'o wrote: >>> >>> The problem is in how the space after the filename in a directory is >>> encoded. The dirdata format is (mildly) expandable, supporting up to >>> 4 different metadata chunks after the filename, using a very >>> compatctly encoded TLV (or moral equivalent) scheme. For directory >>> inodes that have both the encyption and compression flags set, we have >>> a single blob which gets used as the IV for the crypto. >>> >>> So it's the difference between a simple blob that is only used for one >>> thing in this particular case, and something which is the moral >>> equivalent of simple ASN.1 or protobuf encoding. >>> >>> Currently, datadata has defined uses for 2 of the 4 "chunks", which is >>> used in Lustre servers. The proposal which Andreas has suggested is >>> if the dirdata feature is supported, then the 3rd dirdata chunk would >>> be used for the case where we currently used by the >>> encrypted-casefolded extension, and the 4th would get reserved for a >>> to-be-defined extension mechanism. >>> >>> If there ext4 encrypted/casefold is not yet in use, and we can get the >>> changes out to all potential users before they release products out >>> into the field, then one approach would be to only support >>> encrypted/casefold when dirdata is also enabled. >>> >>> If ext4 encrypted/casefold is in use, my suggestion is that we support >>> both encrypted/casefold && !dirdata as you have currently implemented >>> it, and encrypted/casefold && dirdata as Andreas has proposed. >>> >>> IIRC, supporting that Andreas's scheme essentially means that we use >>> the top four bits in the rec_len field to indicate which chunks are >>> present, and then for each chunk which is present, there is a 1 byte >>> length followed by payload. So that means in the case where it's >>> encrypted/casefold && dirdata, the required storage of the directory >>> entry would take one additional byte, plus setting a bit indicating >>> that the encrypted/casefold dirdata chunk was present. >> >> I think your email already covers pretty much all of the points. >> >> One small difference between current "raw" encrypted/casefold hash vs. >> dirdata is that the former is 4-byte aligned within the dirent, while >> dirdata is packed. So in 3/4 cases dirdata would take the same amount >> of space (the 1-byte length would use one of the 1-3 bytes of padding >> vs. the raw format), since the next dirent needs to be aligned anyway. >> >> The other implication here is that the 8-byte hash may need to be >> copied out of the dirent into a local variable before use, due to >> alignment issues, but I'm not sure if that is actually needed or not. >> >>> So, no, they aren't incompatible ultimatly, but it might require a >>> tiny bit more work to integrate the combined support for dirdata plus >>> encrypted/casefold. One way we can do this, if we have to support the >>> current encrypted/casefold format because it's out there in deployed >>> implementations already, is to integrate encrypted/casefold && >>> !dirdata first upstream, and then when we integrate dirdata into >>> upstream, we'll have to add support for the encrypted/casefold && >>> dirdata case. This means that we'll have two variants of the on-disk >>> format to test and support, but I don't think it's the going to be >>> that difficult. >> >> It would be possible to detect if the encrypted/casefold+dirdata >> variant is in use, because the dirdata variant would have the 0x40 >> bit set in the file_type byte. It isn't possible to positively >> identify the "raw" non-dirdata variant, but the assumption would be >> if (rec_len >= round_up(name_len, 4) + 8) in an encrypted+casefold >> directory that the "raw" hash must be present in the dirent. > > So sounds like we're going with the combined version. Andreas, do you > have any suggestions for changes to the casefolding patch to ease the > eventual merging with dirdata? A bunch of the changes are already > pretty similar, so some of it is just calling essentially the same > functions different things. One thing I would suggest is to change the "is_fake_entry()" from using offsets in the leaf block to using the content of the dirent to make that decision. Comparing entries against "." and ".." is trivial (and already done in many places), and the checksum entry/tail has a "magic" file type that can be used. This will avoid potential problems if e.g. encrypted entries are stored inline with the inode, and/or dirdata that also adds fields to "." and "..". Also, the patch adds the use of "lblk" all around the code, but that wouldn't be needed if is_fake_entry() was updated as above? Note in find_group_orlov() the filename hash doesn't strictly need to match the actual hash used in the directory. That is only for finding a suitable group for allocating the inode, so it can be any relatively uniform hash function and could remain DX_HASH_HALF_MD4. Cheers, Andreas --Apple-Mail=_454B88CC-A469-4B83-B785-9645C1B288C0 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org iQIzBAEBCAAdFiEEDb73u6ZejP5ZMprvcqXauRfMH+AFAmA4g58ACgkQcqXauRfM H+BRXg//fy5o+4ytQgWc8yyRlaMFkOF/GiWN2aDTB3IUxEN9L5gt4fXL3B46na18 8e5RjqrYmxEulFuKWqS49L/IF8hBlUhEWRtBiwt1LxKlEcQXC1AbhGdYDwjdbYUw 1fNaAV32Lq+UyJsNgKP2W6tTY3Bn0kEJepk2XMs0lPMQDubQM3OxueKE8VfA9m2x gqOPsUC426hm0LqSG5MAJxXSWbc/CYvfyFiBlDse/HkFcgTrHUWmim4cAROz6xPJ YVtsW91oVur9+IahA0+Vy2qYst989cncJF+6UTIa4dPB5FLaDgoXYrWciuXvpSyN zILmeR6pHUTxWulkWCmozZ/dCrafPoiy7f0OKIvPHA1BlQlthf31khdzvUO/IYwp c8zGvhT/Ig0K4dRxNUBOowZN+4tFIP+AcD8j1P4FquEB3LKTcNlnHQlajnGGYMBS L999E2Zf5PZTNFkO74SXDVxkSH5W61tKCmHf4dLk5Svs2prUWvL44E6UcmsPcqK+ dDQsi3Zq9o9c3do0/4cep6NtpZF2ZadYgOc1bV6lqApOiEkV3vfaCnPIct7MTrrI 9IrYV6awazmVNTgC6l9bEytxwJlJTdmdgIPTXrbZ01Td855Bua3H/DEQIMiYSr3R RxXWKGR4F/WNwl2HX1yEeBcYv0RgWN8UUEMtH5kbNsyOd53z2gk= =yJ/1 -----END PGP SIGNATURE----- --Apple-Mail=_454B88CC-A469-4B83-B785-9645C1B288C0--