Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755359AbbFUP4m (ORCPT ); Sun, 21 Jun 2015 11:56:42 -0400 Received: from mail.parknet.co.jp ([210.171.160.6]:41324 "EHLO mail.parknet.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752928AbbFUP4d (ORCPT ); Sun, 21 Jun 2015 11:56:33 -0400 X-Greylist: delayed 1225 seconds by postgrey-1.27 at vger.kernel.org; Sun, 21 Jun 2015 11:56:32 EDT From: OGAWA Hirofumi To: Jan Kara Cc: Daniel Phillips , David Lang , Rik van Riel , tux3@tux3.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [FYI] tux3: Core changes References: <13c8bcdf-70e8-43d5-a05f-58ad839dbfd0@phunq.net> <5563F5C8.2040806@redhat.com> <67294911-1776-46b8-916d-0e5642a38725@phunq.net> <20150526070910.GA3307@quack.suse.cz> <20150526090058.GA8024@quack.suse.cz> <5564D60E.6000306@phunq.net> <20150527084138.GD2590@quack.suse.cz> Date: Mon, 22 Jun 2015 00:36:00 +0900 In-Reply-To: <20150527084138.GD2590@quack.suse.cz> (Jan Kara's message of "Wed, 27 May 2015 10:41:38 +0200") Message-ID: <87a8vtdqfz.fsf@mail.parknet.co.jp> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3323 Lines: 88 Jan Kara writes: Hi, > So there are a few things to have in mind: > 1) There is nothing like a "writeable" page. Page is always writeable (at > least on x86 architecture). When a page is mapped into some virtual address > space (or more of them), this *mapping* can be either writeable or read-only. > mkwrite changes the mapping from read-only to writeable but kernel / > hardware is free to write to the page regardless of the mapping. > > 2) When kernel / hardware writes to the page, it first modifies the page > and then marks it dirty. > > So what can happen in this scenario is: > > 1) You hand kernel a part of a page as a buffer. page_mkwrite() happens, > page is dirtied, kernel notes a PFN of the page somewhere internally. > > 2) Writeback comes and starts writeback for the page. > > 3) Kernel ships the PFN to the hardware. > > 4) Userspace comes and wants to write to the page (different part than the > HW is instructed to use). page_mkwrite is called, page is forked. > Userspace writes to the forked page. > > 5) HW stores its data in the original page. > > Userspace never sees data from the HW! Data corrupted where without page > forking everything would work just fine. I'm not sure I'm understanding your pseudocode logic correctly though. This logic doesn't seems to be a page forking specific issue. And this pseudocode logic seems to be missing the locking and revalidate of page. If you can show more details, it would be helpful to see more, and discuss the issue of page forking, or we can think about how to handle the corner cases. Well, before that, why need more details? For example, replace the page fork at (4) with "truncate", "punch hole", or "invalidate page". Those operations remove the old page from radix tree, so the userspace's write creates the new page, and HW still refererences the old page. (I.e. situation should be same with page forking, in my understand of this pseudocode logic.) IOW, this pseudocode logic seems to be broken without page forking if no lock and revalidate. Usually, we prevent unpleasant I/O by lock_page or PG_writeback, and an obsolated page is revalidated under lock_page. For page forking, we may also be able to prevent similar situation by locking, flags, and revalidate. But those details might be different with current code, because page states are different. > Another possible scenario: > > 1) Userspace app tells kernel to setup a HW buffer in a page. > > 2) Userspace app fills page with data -> page_mkwrite is called, page is > dirtied. > > 3) Userspace app tells kernel to ship buffer to video HW. > > 4) Writeback comes and starts writeback for the page > > 5) Video HW is done with the page. Userspace app fills new set of data into > the page -> page_mkwrite is called, page is forked. > > 6) Userspace app tells kernel to ship buffer to video HW. But HW gets the > old data from the original page. > > Again a data corruption issue where previously things were working fine. This logic seems to be same as above. Replace the page fork at (5). With no revalidate of page, (6) will use the old page. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/