Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759064AbYFREEd (ORCPT ); Wed, 18 Jun 2008 00:04:33 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752655AbYFREEU (ORCPT ); Wed, 18 Jun 2008 00:04:20 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:46276 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753769AbYFREET (ORCPT ); Wed, 18 Jun 2008 00:04:19 -0400 Date: Tue, 17 Jun 2008 21:03:17 -0700 (PDT) From: Linus Torvalds To: Bron Gondwana cc: Linux Kernel Mailing List , Nick Piggin , Andrew Morton , Rob Mueller , Andi Kleen , Ingo Molnar Subject: Re: BUG: mmapfile/writev spurious zero bytes (x86_64/not i386, bisected, reproducable) In-Reply-To: <20080618031406.GA4326@brong.net> Message-ID: References: <1213682410.13174.1258837181@webmail.messagingengine.com> <1213682570.13708.1258839317@webmail.messagingengine.com> <20080618031406.GA4326@brong.net> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1825 Lines: 46 On Wed, 18 Jun 2008, Bron Gondwana wrote: > > For my sins, I appear to be becoming the world expert on > that particular file. Heh. Congrats ;) > I've debugged skiplist bugs many times over, and completely rewritten > the locking code. It really does some pretty evil things - the memory > accesses look something like this: > > [file...................] > [mmap^....^.^........^^..................................] > [file...................++++++++++++] > [mmap^....^.^........^^.^^ ^ ^^.....................] > > Where (^) is the bits that get accessed. All reads are via > the mmap, all writes are done with retry_write or > retry_writev (Cyrus library functions that keep hammering > until all the bytes are written) Is there any reason it doesn't use mmap(MAP_SHARED) and make the modifications that way too? Because quite frankly, the mixture of doing mmap() and write() system calls is quite fragile - and I'm not saying that just because of this particular bug, but because there are all kinds of nasty cache aliasing issues with virtually indexed caches etc that just fundamentally mean that it's often a mistake to mix mmap with read/write at the same time. (For the same reason it's not a good idea to mix writing through an mmap() and then using read() to read it - again, you can have some nasty aliasing going on there). So this particular issue was definitely a kernel bug (and big thanks for making such a good test-case), but in general, it does sound like Cyrus is actively trying to dig itself into a nasty hole there. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/