Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp281593imm; Thu, 6 Sep 2018 02:18:41 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZOaDLnmyxUdt34tJ/Cwsgt1qmen5WfngL8tqch/wUC836Vjrd2kIuakp+nxiAtiwYaaMBx X-Received: by 2002:a62:2119:: with SMTP id h25-v6mr1909510pfh.112.1536225521101; Thu, 06 Sep 2018 02:18:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536225521; cv=none; d=google.com; s=arc-20160816; b=s1WausknOjz5cyVmrVijm+JoE2NRblYQDpaXnnPoda5rqHMeqvm/cV+CWckePUaJ3i mRbU756olQySyUERamo5SN5dPyW1FNHipAzrRak1YCjy2Z9tga31rsm/UVTabOHAt8OD IFl1pONY8/l2i+CkJ0c4bHUPoVyVxsu51+oV6PgMlRiivKeAstq+Q40FyLWM+DlCRFDf D0S1clj3wJRj5o9eFPfbQ9q8OZDdbEES+oG4HhzcfF62nITh45XFEn934KD55gDLM2Ro EYeMSb2g+Z3nX8pMkfqVJ6bLF9HZ+fu6dDvjGUw5zlYek6mRJ+stYs+UlTRqiwi1V2J3 /V5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:organization:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=qRqpadiCUuwoMLfsJbYmdusTwUiSYKcO5p1v9+ScSSI=; b=MmqDH2YOoxhrtTx25Q61GpaZRr3Gl2jFChej4vR29+C+C3KOoh+etfUYdYkmQDF/P0 o6j97NqNlsuJXqNDD/32v79iSDwV5MeZL8Vyv+uS47zmJF/lYjsZgmmlRsyfoo9mTWgw Pa6LsRgSXREJOyGNCxTUR06SL+Wo83Muw6Qn9m+5Pbzhbi8MPJbsnhiYecwVF7QUp21J IggR2hEEgvBTyyw4TKBJFNNZosJ1NCoX5pi/8bDWShswrrXbJrPRxrnG0QtFCxmB3Swi 8OPlWv54pHT/Sk4sjfwu2GcRmkFbrH/+f2aZ90+1at91n/S+CZx4y9IpqNTlwJByHjcC BMyQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y26-v6si4621098pfn.111.2018.09.06.02.18.25; Thu, 06 Sep 2018 02:18:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728358AbeIFNvv (ORCPT + 99 others); Thu, 6 Sep 2018 09:51:51 -0400 Received: from cust-95-128-94-82.breedbanddelft.nl ([95.128.94.82]:50532 "EHLO cust-95-128-94-82.breedbanddelft.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728035AbeIFNvv (ORCPT ); Thu, 6 Sep 2018 09:51:51 -0400 Received: by abra2.bitwizard.nl (Postfix, from userid 1000) id F408D13F75C; Thu, 6 Sep 2018 11:17:18 +0200 (CEST) Date: Thu, 6 Sep 2018 11:17:18 +0200 From: Rogier Wolff To: Dave Chinner Cc: Jeff Layton , =?utf-8?B?54Sm5pmT5Yas?= , bfields@fieldses.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: POSIX violation by writeback error Message-ID: <20180906091718.GL24519@BitWizard.nl> References: <20180904161203.GD17478@fieldses.org> <20180904162348.GN17123@BitWizard.nl> <20180904185411.GA22166@fieldses.org> <09ba078797a1327713e5c2d3111641246451c06e.camel@redhat.com> <20180905120745.GP17123@BitWizard.nl> <20180906025709.GZ5631@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180906025709.GZ5631@dastard> Organization: BitWizard B.V. User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 06, 2018 at 12:57:09PM +1000, Dave Chinner wrote: > On Wed, Sep 05, 2018 at 02:07:46PM +0200, Rogier Wolff wrote: > > And this has worked for years because > > the kernel caches stuff from inodes and data-blocks. If you suddenly > > write stuff to harddisk at 10ms for each seek between inode area and > > data-area.. > > You're assuming an awful lot about filesystem implementation here. > Neither ext4, btrfs or XFS issue physical IO like this when flushing > data. My thinking is: When fsync (implicit or explicit) needs to know the result of the underlying IO, it needs to wait for it to have happened. My thinking is: You can either log the data in the logfile or just the metadata. By default/most people will chose the last. In the "make sure it hits storage" case, you have three areas. * The logfile * the inode area * the data area. When you allow the application to continue pasta close, you can gather up say a few megabytes of updates to each area and do say 50 seeks per second. (achieving maybe about 50% of the throughput performance of your drive) If you don't store the /data/, you can stay in the inode or logfile area and get a high throughput on your drive. But when a crash has the filesystem in a defined state, what use is that if your application is in a bad state because it is getting bad data? Of course the application can be rewritten to have multiple threads so that while one thread is waiting for a close to finish another one can open/write/close another file. But there are existing applicaitons run by users who do not have the knowledge or option to delve into the source and rewrite the application to be multithreaded. Your 100k files per second is closely similar to mine. In real life we are not going to see such extreme numbers, but in some cases the benchmark does predict a part of the performance of an application. In practice, an application may spend 50% of the time on thinking about the next file to make, and then 50k times per second actually making the file. Roger. -- ** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 ** ** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 ** *-- BitWizard writes Linux device drivers for any device you may have! --* The plan was simple, like my brother-in-law Phil. But unlike Phil, this plan just might work.