Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp652365ybm; Wed, 27 May 2020 04:54:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxMH/BxfA9stVG/N4n77ls9P/psSJN/rAm9oU+SCPY+T3djRqMJws0ttgqerniGHhYcKTZV X-Received: by 2002:a17:906:8694:: with SMTP id g20mr5800976ejx.75.1590580480746; Wed, 27 May 2020 04:54:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590580480; cv=none; d=google.com; s=arc-20160816; b=ep+/mogmXR3ncj5Y7gGyg/LI9ibxBupMURyrowa1CrG3w0R48TczDnElspVRLqfn7S M1aS87Hrj+pxLB0jNCdaejIDRJotOxNnqD7wCQbusC+Z4hp/gWxXeIFN8I+cxejJKdEn tiUJh6eqjRbp7o6/6IVTjytuLgKDzatX0bYRncGUekJg9pymjH8okHGtWRJV3KFH5ird 3Zn5J0giiV9NJfDou0WHq2dwHeF4u8s+lQTskA/ix4KwDOUfOTBWp6vX8sDefNNrO5SC lx5HdApUwAsHulg9Ro64QsrSHC62DRICq8I/EROE8mZLXAjQQmH00OR1etZhuT/j5Hmq ZZcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject; bh=E2xfepDWiR0dq6Cab2kzGFzTZjS6wHdLg1tWxzCNq+8=; b=GXWi3/cwjM/vUVhmJunyIAlcL+XXIzqsIrJWmxG0ltkT4CvR8tjaZ+/2xWu+q4+sSd j9r09seBCwVzoHOFoKMD4c229T8/OID7dwVJHpxda+oTnkMWz+0AyF3xBE11ruk0lZFi xut0F+kRDqTCBNNA05fvrOL4StZ7Si9LRwVvyOmCEQUSBWpy6+moNYvCinbPJGodbth7 F7tdgaYb0OT61E9p1l5/L15eXmAElpW+JqXS/RFCMpj3c+sHsnXIXoIwepOO4GZn1ucs deeyPla87gwI18Y1J3FRVFrxGlPnBlh5KXd8a9TXCj3VUDE0rViAxL3k1H+PjXwoG/UG XjUw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 7si672917edj.329.2020.05.27.04.54.17; Wed, 27 May 2020 04:54:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387590AbgE0K4r (ORCPT + 99 others); Wed, 27 May 2020 06:56:47 -0400 Received: from mail.thelounge.net ([91.118.73.15]:45269 "EHLO mail.thelounge.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387574AbgE0K4r (ORCPT ); Wed, 27 May 2020 06:56:47 -0400 Received: from srv-rhsoft.rhsoft.net (rh.vpn.thelounge.net [10.10.10.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature ECDSA (P-256)) (No client certificate requested) (Authenticated sender: h.reindl@thelounge.net) by mail.thelounge.net (THELOUNGE MTA) with ESMTPSA id 49X77J5gnHzXSL; Wed, 27 May 2020 12:56:44 +0200 (CEST) Subject: Re: [PATCH] ext4: introduce EXT4_BG_WAS_TRIMMED to optimize trim To: Lukas Czerner Cc: Wang Shilong , linux-ext4@vger.kernel.org, Wang Shilong , Shuichi Ihara , Andreas Dilger References: <1590565130-23773-1-git-send-email-wangshilong1991@gmail.com> <20200527091938.647363ekmnz7av7y@work> <520b260b-13e9-4c62-eaeb-c44215b14089@thelounge.net> <20200527095751.7vt74n7grfre6wit@work> <59df4f2f-f168-99a1-e929-82742693f8ee@thelounge.net> <20200527103214.knm2vmnwjt64j55l@work> From: Reindl Harald Organization: the lounge interactive design Message-ID: Date: Wed, 27 May 2020 12:56:44 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0 MIME-Version: 1.0 In-Reply-To: <20200527103214.knm2vmnwjt64j55l@work> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Am 27.05.20 um 12:32 schrieb Lukas Czerner: > On Wed, May 27, 2020 at 12:11:52PM +0200, Reindl Harald wrote: >> >> Am 27.05.20 um 11:57 schrieb Lukas Czerner: >>> On Wed, May 27, 2020 at 11:32:02AM +0200, Reindl Harald wrote: >>>> >>>> >>>> Am 27.05.20 um 11:19 schrieb Lukas Czerner: >>>>> On Wed, May 27, 2020 at 04:38:50PM +0900, Wang Shilong wrote: >>>>>> From: Wang Shilong >>>>>> >>>>>> Currently WAS_TRIMMED flag is not persistent, whenever filesystem was >>>>>> remounted, fstrim need walk all block groups again, the problem with >>>>>> this is FSTRIM could be slow on very large LUN SSD based filesystem. >>>>>> >>>>>> To avoid this kind of problem, we introduce a block group flag >>>>>> EXT4_BG_WAS_TRIMMED, the side effect of this is we need introduce >>>>>> extra one block group dirty write after trimming block group. >>>> >>>> would that also fix the issue that *way too much* is trimmed all the >>>> time, no matter if it's a thin provisioned vmware disk or a phyiscal >>>> RAID10 with SSD >>> >>> no, the mechanism remains the same, but the proposal is to make it >>> pesisten across re-mounts. >>> >>>> >>>> no way of 315 MB deletes within 2 hours or so on a system with just 485M >>>> used >>> >>> The reason is that we're working on block group granularity. So if you >>> have almost free block group, and you free some blocks from it, the flag >>> gets freed and next time you run fstrim it'll trim all the free space in >>> the group. Then again if you free some blocks from the group, the flags >>> gets cleared again ... >>> >>> But I don't think this is a problem at all. Certainly not worth tracking >>> free/trimmed extents to solve it. >> >> it is a problem >> >> on a daily "fstrim -av" you trim gigabytes of alredy trimmed blocks >> which for example on a vmware thin provisioned vdisk makes it down to >> CBT (changed-block-tracking) >> >> so instead completly ignore that untouched space thanks to CBT it's >> considered as changed and verified in the follow up backup run which >> takes magnitutdes longer than needed > > Looks like you identified the problem then ;) well, in a perfect world..... > But seriously, trim/discard was always considered advisory and the > storage is completely free to do whatever it wants to do with the > information. I might even be the case that the discard requests are > ignored and we might not even need optimization like this. But > regardless it does take time to go through the block gropus and as a > result this optimization is useful in the fs itself. luckily at least fstrim is non-blocking in a vmware environment, on my physical box it takes ages this machine *does nothing* than wait to be cloned, 235 MB pretended deleted data within 50 minutes is absurd on a completly idle guest so even when i am all in for optimizations thatÄs way over top [root@master:~]$ fstrim -av /boot: 0 B (0 bytes) trimmed on /dev/sda1 /: 235.8 MiB (247201792 bytes) trimmed on /dev/sdb1 [root@master:~]$ df Filesystem Type Size Used Avail Use% Mounted on /dev/sdb1 ext4 5.8G 502M 5.3G 9% / /dev/sda1 ext4 485M 39M 443M 9% /boot > However it seems to me that the situation you're describing calls for > optimization on a storage side (TP vdisk in your case), not file system > side. > > And again, for fine grained discard you can use -o discard with a terrible performance impact at runtime