Received: by 2002:a05:6358:16cc:b0:ea:6187:17c9 with SMTP id r12csp8110575rwl; Tue, 10 Jan 2023 09:09:19 -0800 (PST) X-Google-Smtp-Source: AMrXdXutmDYr4m1q9yG0JJe7iyp4LrfJySKrDCCcCsY3CPjop1BJAPhKbg52+on2G4z0fOyxXNYv X-Received: by 2002:a05:6402:914:b0:479:dc9c:1144 with SMTP id g20-20020a056402091400b00479dc9c1144mr23121524edz.24.1673370558853; Tue, 10 Jan 2023 09:09:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673370558; cv=none; d=google.com; s=arc-20160816; b=a9iJl0TvBICxQlnjw1afH/wY03OWatjv+JDlNYQxkq0oYT8yKP9q5YgWmiSVy5JW2c ltMPH60YR71xTQSPTLcLFrUhyKM44vT+X/93ZYkfmBJJyv0/48LGtHv47tqIBtn7R7qH qg2bugbQQm7sGajIpAWQBebgAYRV/ZemxtP57QoNDL3ibTzQlCnQz3TiSjoKIMaf17Xa JHAKtGWJsZD2LM044m3ouegWbiiTA4ORCoL1ziQ3d89mOCH7PrI/fNOufPyO7DF5AOIL 5RRL4yjoNrne9iLu3FZrMPmXncnjK8imtZYJ8/YOFctAntqScNq2wKRZK4ODkdft1jMg hGsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=EMKV6nf2FFKubT9BPJN4AKDdipkI2cilqJuASvqNxbk=; b=wvxHVL8i4wHdczvi1vloN0xo1/1toTWqnllHPpTSnSxyam/MSyl1fDbqbCmnfpnx4E pjQ2gbOJ584rItb3wBlj6V/1Rsr5QR8LheAyimysGoJbNfecLjl6rpaDcZutaWNVw38A MbTBEXGRVLMFo5JYR77cwRhNjXiDMND7aiShjWh475DJxKn6M7HaOtnFCvYMhx8Awjgw eUr49OrIXsXPOz/hcI63oYOg9HHj93Rs6mDXk2Z6J1oSEXorEeh7Les2vpGTneBvu7sd GaS8il4L/HjVm/Hc9xygfgjIr5d2yKpc54stJFbiCBqexomgkuGGVttyvetkvUEXGkJQ RfcA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ftaTeE9a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c12-20020a056402120c00b0049314618772si10639125edw.21.2023.01.10.09.09.05; Tue, 10 Jan 2023 09:09:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ftaTeE9a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233930AbjAJQfd (ORCPT + 53 others); Tue, 10 Jan 2023 11:35:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238969AbjAJQei (ORCPT ); Tue, 10 Jan 2023 11:34:38 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F1A6387F0E for ; Tue, 10 Jan 2023 08:33:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673368400; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=EMKV6nf2FFKubT9BPJN4AKDdipkI2cilqJuASvqNxbk=; b=ftaTeE9aoy7MXExjbQwNZzHtPhTmcUGAVpAC2cX7pI9XadiDI5LWQLbuR4s8r0NvLs3tQU 10vRy3qwDXs7QUJB3YlrrHSbAIb0CFhlNjsQbsvpP1pWKXkYQ2Q5kxvzkv7LXSyFydGcjl r7nHQZ+T4C8sbnkoMaOSW/Jn9WM0xCw= Received: from mail-lj1-f200.google.com (mail-lj1-f200.google.com [209.85.208.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-14-QiahpmRdMZmPSpb0tIsmMw-1; Tue, 10 Jan 2023 11:33:15 -0500 X-MC-Unique: QiahpmRdMZmPSpb0tIsmMw-1 Received: by mail-lj1-f200.google.com with SMTP id b25-20020a2e9899000000b002877a271a9dso574901ljj.20 for ; Tue, 10 Jan 2023 08:33:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=EMKV6nf2FFKubT9BPJN4AKDdipkI2cilqJuASvqNxbk=; b=SM9sT7Th3P0aKjDp+VG2LgWCXL1sJCKwmbKKQEpsagwUAb5sRz/hzGrFY7Dvdw+Jjt jBjrtJsrCLTV/Od6PaITZN0TtkJ+WoXMEQSVX/c4fMNd5XzJA5qMM3KxtZyq7Td10fXm xdAJ0ba4na1k6yC83WAvTEt44YRCO3e3hSYi/UXaduhf8ZDA1GClpn1zrK/sR4+rZ/yn jR7SoFist9JZo7OB6aCSKQq79c/n1x6wtsQET2VPlOf+VdV8SQlHioGj2qyJOFTlQIFJ 4sN7K3YtM2URbeGpeCJykPEwyqKn+PkdpkaL0fczcRdMj9Vqr2GyYNBKGUhAM3e3V5UA DbgA== X-Gm-Message-State: AFqh2koCmUiHTcv7tjta0uvu+JaSJhYfcCvn3o6bgTw+IMJJP46HVmxm fruvBK85raIhsHn4Ob1PGUS948wFSsi/CNbA9biDoEBjFzyXwSDGHZ9QufnMNOFMWDEq5OCJ72w KwgCAl0pO30heAaVT29eCBWL4 X-Received: by 2002:a2e:9609:0:b0:286:6ae7:5068 with SMTP id v9-20020a2e9609000000b002866ae75068mr1299106ljh.26.1673368394087; Tue, 10 Jan 2023 08:33:14 -0800 (PST) X-Received: by 2002:a2e:9609:0:b0:286:6ae7:5068 with SMTP id v9-20020a2e9609000000b002866ae75068mr1299102ljh.26.1673368393912; Tue, 10 Jan 2023 08:33:13 -0800 (PST) Received: from greebo.mooo.com (c-e6a5e255.022-110-73746f36.bbcust.telenor.se. [85.226.165.230]) by smtp.gmail.com with ESMTPSA id p11-20020a2eb98b000000b0027fed440702sm1362364ljp.98.2023.01.10.08.33.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Jan 2023 08:33:13 -0800 (PST) Message-ID: <3e491cfe02590cf5fce49851cf8fa040e0a4c8f6.camel@redhat.com> Subject: Re: [PATCH 3/6] composefs: Add descriptor parsing code From: Alexander Larsson To: Brian Masney Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, gscrivan@redhat.com Date: Tue, 10 Jan 2023 17:33:12 +0100 In-Reply-To: References: <1c4c49fac5bb6406a8cb55ca71f8060703aa63f6.1669631086.git.alexl@redhat.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.2 (3.46.2-1.fc37) MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2023-01-05 at 15:02 -0500, Brian Masney wrote: > On Mon, Nov 28, 2022 at 12:16:59PM +0100, Alexander Larsson wrote: > > This adds the code to load and decode the filesystem descriptor > > file > > format. > >=20 > >=20 > > +#define CFS_N_PRELOAD_DIR_CHUNKS 4 >=20 > From looking through the code it appears that this is actually the > maximum number of chunks. Should this be renamed from PRELOAD to MAX? >=20 No, this is the number of directory chunks we statically pre-load while loading the inode. If there are more (i.e. for larger directories) we load chunks dynamically as needed during the directory operations. >=20 > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0header->magic =3D cfs_u32_fr= om_file(header->magic); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0header->data_offset =3D cfs_= u64_from_file(header- > > >data_offset); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0header->root_inode =3D cfs_u= 64_from_file(header->root_inode); >=20 > Should the cpu to little endian conversion occur in cfs_read_data()? cfs_read_data() reads just a block of opaque data. It can't know which parts make up individual values to convert. > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (header->magic !=3D CFS_M= AGIC || > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 header->d= ata_offset > ctx.descriptor_len || > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 sizeof(st= ruct cfs_header_s) + header->root_inode > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ctx.descriptor_len) { > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0res =3D -EINVAL; >=20 > Should this be -EFSCORRUPTED? >=20 I don't think so. I can see the argument for it, but at this point we just looked at a file based on a mount argument, and it seems completely wrong (not just corrupt in the middle). Reporting EINVAL in the mount syscall for "invalid argument" seems more right to me. > >=20 > > +static void *cfs_get_vdata_buf(struct cfs_context_s *ctx, u64 > > offset, u32 len, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct cfs_buf *buf) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (offset > ctx->descriptor= _len - ctx->header.data_offset) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return ERR_PTR(-EINVAL); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (len > ctx->descriptor_le= n - ctx->header.data_offset - > > offset) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return ERR_PTR(-EINVAL); >=20 > It appears that these same checks are already done in cfs_get_buf(). >=20 Not quite. It is true that we check in cfs_get_buf() that the arguments are not completely outside the file. However, this part checks that the offset is specifically within the vdata part of the file. In particular, if we rely on the checks in cfs_get_buf() we could get confused if the below data_offset + offset wrapped. > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return cfs_get_buf(ctx, ctx-= >header.data_offset + offset, > > len, buf); > > +} >=20 >=20 > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ino->flags =3D cfs_read_u32(= &data); > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0inode_size =3D cfs_inode_enc= oded_size(ino->flags); >=20 > Should CFS_INODE_FLAGS_DIGEST_FROM_PAYLOAD also be accounted for in > cfs_inode_encoded_size()? No, that flag just means that the already existing payload (which contains the pathname for the backing data) should be used as the source for the digest (to avoid storing it twice). It doesn't take up any extra space otherwise. > Also, cfs_inode_encoded_size() is only used here so can be brought > into this file. >=20 I see this as sort of part of the filesystem on-disk format, so I rather have it in the cfs.h header with the rest of the fs definition. >=20 > > +static bool cfs_validate_filename(const char *name, size_t > > name_len) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (name_len =3D=3D 0) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return false; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (name_len =3D=3D 1 && nam= e[0] =3D=3D '.') > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return false; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (name_len =3D=3D 2 && nam= e[0] =3D=3D '.' && name[1] =3D=3D '.') >=20 > Can strcmp() be used here? >=20 name is not zero-terminated, so I don't think so. At least not in any natural way. > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0inode_data->has_digest =3D r= et !=3D 0; >=20 > Can you do 'has_digest =3D inode_data->digest !=3D NULL;' to get rid of > the need for return 1 in cfs_get_digest(). No, because ->digest is an in-line array, not a pointer. > > +static inline int memcmp2(const void *a, const size_t a_size, > > const void *b, > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= size_t b_size) > > +{ > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0size_t common_size =3D min(a= _size, b_size); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0int res; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0res =3D memcmp(a, b, common_= size); > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (res !=3D 0 || a_size =3D= =3D b_size) > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0return res; > > + > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0return a_size < b_size ? -1 = : 1; >=20 > This function appears to be used only in one place below. It doesn't > seem like it matters for the common_size. Can this just be dropped > and use memcmp()? Not sure what you mean by this. This is used as a sort/order function, not just as an "is-the-same" check, so we have to report e.g. that "prefixXYZ" is after "prefix". In other words, this is essentially strcmp(), but the strings are not zero terminated. --=20 =3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D= -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- =3D-=3D-=3D Alexander Larsson Red Hat, Inc=20 alexl@redhat.com alexander.larsson@gmail.com=20 He's a deeply religious Amish hairdresser who hides his scarred face=20 behind a mask. She's a mentally unstable gypsy bodyguard in the witness protection program. They fight crime!=20