Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754930AbcKBBih (ORCPT ); Tue, 1 Nov 2016 21:38:37 -0400 Received: from mail-pf0-f178.google.com ([209.85.192.178]:33882 "EHLO mail-pf0-f178.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754166AbcKBBif (ORCPT ); Tue, 1 Nov 2016 21:38:35 -0400 Date: Tue, 1 Nov 2016 18:38:26 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Dave Chinner cc: Hugh Dickins , Jakob Unterwurzacher , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: tmpfs returns incorrect data on concurrent pread() and truncate() In-Reply-To: <20161102010147.GC9920@dastard> Message-ID: References: <18e9fa0f-ec31-9107-459c-ae1694503f87@gmail.com> <20161102010147.GC9920@dastard> User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2411 Lines: 54 On Wed, 2 Nov 2016, Dave Chinner wrote: > On Tue, Nov 01, 2016 at 04:51:30PM -0700, Hugh Dickins wrote: > > On Wed, 26 Oct 2016, Jakob Unterwurzacher wrote: > > > > > tmpfs seems to be incorrectly returning 0-bytes when reading from > > > a file that is concurrently being truncated. > > > > That is an interesting observation, and you got me worried; > > but in fact, it is not a tmpfs problem: if we call it a > > problem at all, it's a VFS problem or a userspace problem. > > > > You chose a ratio of 3 preads to 1 ftruncate in your program below: > > let's call that the Unterwurzacher Ratio, 3 for tmpfs; YMMV, but for > > me 4 worked well to show the same issue on ramfs, and 15 on ext4. > > > > The Linux VFS does not serialize reads against writes or truncation > > very strictly: > > Which is a fine, because... > > > it's unusual to need that serialization, and most > > .... many filesystems need more robust serialisation as hole punching > (and other fallocate-based extent manipulations) have much stricter > serialisation requirements than truncate and these .... > > > users prefer maximum speed to the additional locking, or intermediate > > buffering, that would be required to avoid the issue you've seen. > > .... require additional locking to be done at the filesystem level > to avoid race conditions. > > Throw in the fact that we already have to do this serialisation in > the filesystem for direct IO as there are no page locks to serialise > direct IO against truncate. And we need to lock out page faults > from refaulting while we are doing things like punching holes (to > avoid data coherency and corruption bugs), so we need more > filesystem level locks to serialise mmap against fallocate(). > > And DAX has similar issues - there are no struct pages to serialise > read or mmap access against truncate, so again we need filesystem > level serialisation for this. > > Put simple: page locks are insufficient as a generic mechanism for > serialising filesystem operations. The locking required for this is > generally deeply filesystem implementation specific, so it's fine > that the VFS doesn't attempt to provide anything stricter than it > currently does.... I think you are saying that: xfs already provides the extra locking that avoids this issue; most other filesystems do not; but more can be expected to add that extra locking in the coming months? Hugh