Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1137377pxb; Wed, 6 Apr 2022 09:34:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy33ELHZZ2tdsGBVJydGMgLKZy3irVdrSt1yIwxyMHmQ1M/xam8xSBNGm63drslPhseycLH X-Received: by 2002:a17:903:2498:b0:156:e4f:b01d with SMTP id p24-20020a170903249800b001560e4fb01dmr9396297plw.84.1649262859966; Wed, 06 Apr 2022 09:34:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649262859; cv=none; d=google.com; s=arc-20160816; b=ICju8bdeiWLa1lS0hT4ao0ZSXiKvuH9Izwv5emEnkJSifQxVvJbTwvcb6j4XW/UueF 118j+rGVCQGBygYU78+A1jXv65HEwyGR/xJpumZyjiLWA+rAI05aGsqpz4lIXu3N/FLc HCWs1gibl28XltZJS3eggkyfawjPqjJ1ka/48dehrApmez9OaLCN7U2cA/fNY2kr/xy6 qzIMrVipbatZswLycszvUHzMC1d+mZN3JoVPMjS52Hozaoq9Gf8PSfu+HSgOr8SbK0VP YUzQzAdn1BHzJTYQm54dNsurkZRJC+mkXFTJMpF6nFD2kk6uiTVvjHU+tcmfz6OJZM0Y Bleg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=yd6as5x4Uder5Km5LfKQ4Jc4Svl079Qfrglg3q3KIGk=; b=SAXTID3NYdsgtXtimEiDWFfPU4+kApPCi72dkwqpABEA4aF88L9NTnqFZOMR425fdj h86VFZLWopBqSOmxmKwG+ww47dc1TlaRFXGOU6psyyMZNpNxcUjdTEIvv+WkGJhvhO9N Vodc7VEpxw/BlsM5/4aQMXgmMlN9khAS9HE3vUz+E1ZYwiyHF7V/L+oF3d2tVFKejAiS lmUmo2yYNvtZ/vHQTnNnOFb7aubeo+H2mT7Qp4bYLWNXQ0+sAXkmfIOW7y8uVcg/duEj iL9g9D8TahiPWYI6QBg3ZpCYL4waCuljVt0L/MnhZkb2Ll+AYFqK89USqlqjkNi7dgCc 4rgA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HpSae94z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e9-20020a170903240900b00153b2d1656csi16883135plo.372.2022.04.06.09.34.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 09:34:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=HpSae94z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id BB0B52D9A18; Wed, 6 Apr 2022 08:42:28 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236188AbiDFPn4 (ORCPT + 99 others); Wed, 6 Apr 2022 11:43:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58236 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236261AbiDFPnk (ORCPT ); Wed, 6 Apr 2022 11:43:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0DF234E6F66 for ; Wed, 6 Apr 2022 06:11:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649250617; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yd6as5x4Uder5Km5LfKQ4Jc4Svl079Qfrglg3q3KIGk=; b=HpSae94zP5Mh/o4U9jcoJgPQD3oJOP2oDtFkkPjVj4lX/NgVzy5cuNTStVVNGn0CaxTgqL MVq/gJ/CzfiaJy+qsz9X+tME/fgEuHJWN3ChDvpwpswkBhBhijSKPDyqhp9XM+L6ME7vb0 baDpc0M+AdGa0irgCnTp1yfSklIZAC8= Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-640-vbbORDxsOwKnBDYmN34Sag-1; Wed, 06 Apr 2022 09:10:16 -0400 X-MC-Unique: vbbORDxsOwKnBDYmN34Sag-1 Received: by mail-pf1-f197.google.com with SMTP id 138-20020a621690000000b004fa807ac59aso1511220pfw.19 for ; Wed, 06 Apr 2022 06:10:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=yd6as5x4Uder5Km5LfKQ4Jc4Svl079Qfrglg3q3KIGk=; b=M8Ie6Cy4PSXdiPy/e6r5LOjovELkMlfnfv4XL7RLAmm0fn5RrkCbm220rqbXOCVvV/ 9Nmt2v38PTLCpRHA+ZLqYlF/P8jTWqdtAtvZs2yFny+87rIA0PLfNRe0PedD7ZkC8aMb iJ2kuydy7NACWfH7t4hyx8o6WyNydM3FlpWchnuXhGj/777VX0qRxmpD3bEktopQQES9 6DM57hj67C0hsdp95ezgvN2cbVIB4KkAay6xy0oaKFfJS5ZRum71ww6Y4sJFaGYBLgFx KTcVs12qCxATeU7DRQlw3KrroNS2m4Xl6TkY0gnxzXMXSjKyLOt7dWerA7WGJ7VOkub2 VVjQ== X-Gm-Message-State: AOAM530kjbwZut56vJmEacUhZLKf2EaMwUTx+aZAoX/iMW1GaEy1CB31 oOHARxFtWS4kQHZMEq1pN91lq/TcywThijEuR08CsnsNyYqFdtVr9Pv+imEr5oIi4mo/X6ZDjHs s50ybziuBfpr3l/ioTiigJBvW1eESFRBDq1LDFEnQfm/BhfYnUASLQklT05eKmRATA7DUsJv3Ow == X-Received: by 2002:a17:902:ec8c:b0:154:2e86:dd51 with SMTP id x12-20020a170902ec8c00b001542e86dd51mr8499226plg.99.1649250614718; Wed, 06 Apr 2022 06:10:14 -0700 (PDT) X-Received: by 2002:a17:902:ec8c:b0:154:2e86:dd51 with SMTP id x12-20020a170902ec8c00b001542e86dd51mr8499189plg.99.1649250614344; Wed, 06 Apr 2022 06:10:14 -0700 (PDT) Received: from [10.72.13.31] ([209.132.188.80]) by smtp.gmail.com with ESMTPSA id h20-20020a056a001a5400b004fb1b4b010asm19651730pfv.162.2022.04.06.06.10.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 06 Apr 2022 06:10:13 -0700 (PDT) Subject: Re: [PATCH v2] ceph: invalidate pages when doing DIO in encrypted inodes To: Jeff Layton , =?UTF-8?Q?Lu=c3=ads_Henriques?= Cc: Ilya Dryomov , ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org References: <20220401133243.1075-1-lhenriques@suse.de> <878rsia391.fsf@brahms.olymp> <6ba91390-83e8-8702-2729-dc432abd3cc5@redhat.com> <87zgky8n0o.fsf@brahms.olymp> <6306fba71325483a1ea22fa73250c8777ea647d7.camel@kernel.org> From: Xiubo Li Message-ID: <321104e6-36db-c143-a7ba-58f9199e6fb7@redhat.com> Date: Wed, 6 Apr 2022 21:10:08 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <6306fba71325483a1ea22fa73250c8777ea647d7.camel@kernel.org> Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/6/22 7:48 PM, Jeff Layton wrote: > On Wed, 2022-04-06 at 12:33 +0100, Lu?s Henriques wrote: >> Xiubo Li writes: >> >>> On 4/6/22 6:57 PM, Lu?s Henriques wrote: >>>> Xiubo Li writes: >>>> >>>>> On 4/1/22 9:32 PM, Lu?s Henriques wrote: >>>>>> When doing DIO on an encrypted node, we need to invalidate the page cache in >>>>>> the range being written to, otherwise the cache will include invalid data. >>>>>> >>>>>> Signed-off-by: Lu?s Henriques >>>>>> --- >>>>>> fs/ceph/file.c | 11 ++++++++++- >>>>>> 1 file changed, 10 insertions(+), 1 deletion(-) >>>>>> >>>>>> Changes since v1: >>>>>> - Replaced truncate_inode_pages_range() by invalidate_inode_pages2_range >>>>>> - Call fscache_invalidate with FSCACHE_INVAL_DIO_WRITE if we're doing DIO >>>>>> >>>>>> Note: I'm not really sure this last change is required, it doesn't really >>>>>> affect generic/647 result, but seems to be the most correct. >>>>>> >>>>>> diff --git a/fs/ceph/file.c b/fs/ceph/file.c >>>>>> index 5072570c2203..b2743c342305 100644 >>>>>> --- a/fs/ceph/file.c >>>>>> +++ b/fs/ceph/file.c >>>>>> @@ -1605,7 +1605,7 @@ ceph_sync_write(struct kiocb *iocb, struct iov_iter *from, loff_t pos, >>>>>> if (ret < 0) >>>>>> return ret; >>>>>> - ceph_fscache_invalidate(inode, false); >>>>>> + ceph_fscache_invalidate(inode, (iocb->ki_flags & IOCB_DIRECT)); >>>>>> ret = invalidate_inode_pages2_range(inode->i_mapping, >>>>>> pos >> PAGE_SHIFT, >>>>>> (pos + count - 1) >> PAGE_SHIFT); >>>>> The above has already invalidated the pages, why doesn't it work ? >>>> I suspect the reason is because later on we loop through the number of >>>> pages, call copy_page_from_iter() and then ceph_fscrypt_encrypt_pages(). >>> Checked the 'copy_page_from_iter()', it will do the kmap for the pages but will >>> kunmap them again later. And they shouldn't update the i_mapping if I didn't >>> miss something important. >>> >>> For 'ceph_fscrypt_encrypt_pages()' it will encrypt/dencrypt the context inplace, >>> IMO if it needs to map the page and it should also unmap it just like in >>> 'copy_page_from_iter()'. >>> >>> I thought it possibly be when we need to do RMW, it may will update the >>> i_mapping when reading contents, but I checked the code didn't find any >>> place is doing this. So I am wondering where tha page caches come from ? If that >>> page caches really from reading the contents, then we should discard it instead >>> of flushing it back ? >>> >>> BTW, what's the problem without this fixing ? xfstest fails ? >> Yes, generic/647 fails if you run it with test_dummy_encryption. And I've >> also checked that the RMW code was never executed in this test. >> >> But yeah I have assumed (perhaps wrongly) that the kmap/kunmap could >> change the inode->i_mapping. >> > No, kmap/unmap are all about high memory and 32-bit architectures. Those > functions are usually no-ops on 64-bit arches. Yeah, right. So they do nothing here. >> In my debugging this seemed to be the case >> for the O_DIRECT path. That's why I added this extra call here. >> > I agree with Xiubo that we really shouldn't need to invalidate multiple > times. > > I guess in this test, we have a DIO write racing with an mmap read > Probably what's happening is either that we can't invalidate the page > because it needs to be cleaned, or the mmap read is racing in just after > the invalidate occurs but before writeback. This sounds a possible case. > In any case, it might be interesting to see whether you're getting > -EBUSY back from the new invalidate_inode_pages2 calls with your patch. > If it's really this case maybe this should be retried some where ? -- Xiubo