Received: by 2002:a4a:301c:0:0:0:0:0 with SMTP id q28-v6csp717631oof; Tue, 25 Sep 2018 03:49:33 -0700 (PDT) X-Google-Smtp-Source: ACcGV62tnEYz0tMdNuQtaqw+5Hnx964dftJNU+FK+NMpZam9DgGtXeU93VTUxmTjCZbLIJ19ODhP X-Received: by 2002:a63:a919:: with SMTP id u25-v6mr465138pge.211.1537872573127; Tue, 25 Sep 2018 03:49:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537872573; cv=none; d=google.com; s=arc-20160816; b=A70SZ/2yuxqry3QX2cpd6U4ZlKhstpku+h85mi+SU6ty+y2GLwEIvO/8mtkH2ukydo vbudGS01Oi1t4B4p8KzvSsEcD3MG9S2xOQZlHMg96BzPivYeGqjeMV3LMnQbhJTVUiMM iDphUAYHxHG4oYJxCM+YoQLK+Ds00HxZNm0LMrpFvJRYNO8ZflG0dQpDB4IgG9hJCYOs mHM2jcjMdfHTCNoZTXlMMJclu+bITDNMjzOc6WKB4/TooQgL6BQ597e3Av08PYuKz76A aLWKdMowU2ki48EQQW3Cehr3SslAb08CPzu5uuLBX0W9fKNntDafmZi4fHgz1dLccRmJ wO6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=X8FpidlBw8XsDPMFDJ/TQAPEhEZNcMoc00nevPCX9C8=; b=eY9tNVe/lpRcYs20CfoxgxdtztxbYSTUO4LoVCBqrlsX7HV5ncvV8l++eenknh5fWh ifmf7aJeyNBgWvUJJs3Gyc/WRJyEVy4O/zpshgiGR/tdLlclEzZQSEy2EhIWrD0/9scQ 745guJGsYOqdrIPYGcxDNZIsT0+fcoU6VZLSX9SZH0+Ms0mnIlbMD+oSEeceqU1sTSBt tvnXKPPkdSpArLYQhTv/Z6yHmUY/1a4XqBcQeFqlezIVWg8G6nxSkkBXc3vMJq8ZHmzf 3i9DSWvbZpMjlZ0i0DnMZo6Xr//hqI3irq9uKwF6t9j0/RTOvpYpaU/BvA/jZpUILSeg 2vuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=sjQkco8K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 68-v6si1964817pga.113.2018.09.25.03.49.17; Tue, 25 Sep 2018 03:49:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=sjQkco8K; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728904AbeIYQzd (ORCPT + 99 others); Tue, 25 Sep 2018 12:55:33 -0400 Received: from mail-pf1-f196.google.com ([209.85.210.196]:42841 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726586AbeIYQzc (ORCPT ); Tue, 25 Sep 2018 12:55:32 -0400 Received: by mail-pf1-f196.google.com with SMTP id l9-v6so10967035pff.9 for ; Tue, 25 Sep 2018 03:48:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=X8FpidlBw8XsDPMFDJ/TQAPEhEZNcMoc00nevPCX9C8=; b=sjQkco8Ke11f5M/5k1LGAu4QHSsmIkkN4N3+3cQpo1HcS7iw5153FPNpN5y3UjDF3r JeIFBlWune9rMhzwRW2Pl1DwJAHAapwo9N1FDzLAKrmnpZy+47KfXRaXu0cM5aElYl4h Iwn6XuYaON/LA9G3onDruL31irpr+EF26vi/eDuqy+4G3w/3w+aE7DBnvdjdEZb23kFl V/gnXUiyu7mPcEwnWKNfNhj8XuEDMEgrVlIppQfNEBam+Zn5NHFrosVq5IuteUGo/7Ko 6tl5K0oRSQHSdB2KaBVhfgNzICQcVD8gJ319pKOE+imCfBq4ZjIRoK+2URZP7ubKfNvk wKaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=X8FpidlBw8XsDPMFDJ/TQAPEhEZNcMoc00nevPCX9C8=; b=ZiQF36eu+UaYiIW32UsbJzmIbOi5wiL7/0jKr2E6bdTYMLlM1eJ2jPw9bkP6JeYva1 oSBXmQxCmEMmO/NkF1bqsjjKFhfko/9B6sBhrEZ3j0OXf/nw+450U04ALwsqAjpKj96X zSe0+jLdhzeVXLI2EHCjXiDWd5VT7VEs0pu8S9d4GTm/3t/kFviT7OUVQYVIa3lBE2+F Ug46+3pd1E+cVmgdM4gAklUwxR939mQ2t+AVbEvCy8Jr4eZqFfQMHulyb7IX53y3PeOQ l/5KPm2cG7RL1SPfj3WO6jW5mLIIDaYZqGuWI6CUd0mGPxk2jw4sHTyaUAnw0HDMvhNz EzTg== X-Gm-Message-State: ABuFfoiKnXaz452hzcKu2f6MF7aSYv0+ysUdQG0cPXSSppsG6bkhyXw5 g2vQKkJHHNA1M4zmMd049Ks2vQ== X-Received: by 2002:a62:2e02:: with SMTP id u2-v6mr543380pfu.134.1537872515205; Tue, 25 Sep 2018 03:48:35 -0700 (PDT) Received: from kshutemo-mobl1.localdomain ([192.55.54.44]) by smtp.gmail.com with ESMTPSA id d63-v6sm2829542pfd.94.2018.09.25.03.48.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 25 Sep 2018 03:48:34 -0700 (PDT) Received: by kshutemo-mobl1.localdomain (Postfix, from userid 1000) id 0A05B3005E0; Tue, 25 Sep 2018 13:48:30 +0300 (+03) Date: Tue, 25 Sep 2018 13:48:29 +0300 From: "Kirill A. Shutemov" To: Yury Norov Cc: Andrew Morton , Al Viro , Dan Williams , Huang Ying , "Michael S . Tsirkin" , Michel Lespinasse , Souptick Joarder , Willy Tarreau , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH] mm: fix COW faults after mlock() Message-ID: <20180925104829.jld5xd6evr7uhwfw@kshutemo-mobl1> References: <20180924130852.12996-1-ynorov@caviumnetworks.com> <20180924212246.vmmsmgd5qw6xkfwh@kshutemo-mobl1> <20180924234843.GA23726@yury-thinkpad> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180924234843.GA23726@yury-thinkpad> User-Agent: NeoMutt/20180716 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 25, 2018 at 02:48:43AM +0300, Yury Norov wrote: > On Tue, Sep 25, 2018 at 12:22:47AM +0300, Kirill A. Shutemov wrote: > > External Email > > > > On Mon, Sep 24, 2018 at 04:08:52PM +0300, Yury Norov wrote: > > > After mlock() on newly mmap()ed shared memory I observe page faults. > > > > > > The problem is that populate_vma_page_range() doesn't set FOLL_WRITE > > > flag for writable shared memory in mlock() path, arguing that like: > > > /* > > > * We want to touch writable mappings with a write fault in order > > > * to break COW, except for shared mappings because these don't COW > > > * and we would not want to dirty them for nothing. > > > */ > > > > > > But they are actually COWed. The most straightforward way to avoid it > > > is to set FOLL_WRITE flag for shared mappings as well as for private ones. > > > > Huh? How do shared mapping get CoWed? > > > > In this context CoW means to create a private copy of the page for the > > process. It only makes sense for private mappings as all pages in shared > > mappings do not belong to the process. > > > > Shared mappings will still get faults, but a bit later -- after the page > > is written back to disc, the page get clear and write protected to catch > > the next write access. > > > > Noticeable exception is tmpfs/shmem. These pages do not belong to normal > > write back process. But the code path is used for other filesystems as > > well. > > > > Therefore, NAK. You only create unneeded write back traffic. > > Hi Kirill, > > (My first reaction was exactly like yours indeed, but) on my real > system (Cavium OcteonTX2), and on my qemu simulation I can reproduce > the same behavior: just mlock()ed memory causes faults. That faults > happen because page is mapped to the process as read-only, while > underlying VMA is read-write. So faults get resolved well by just > setting write access to the page. mlock() doesn't guarntee that you'll never get a *minor* fault. Write back or page migration will get these pages write-protected. Making pages write protected is what we rely on for proper dirty accounting: filesystems need to know when page gets dirty and allocate resources for properly write back the page. Once page is written back to storage the page gets write protected again to catch the next write access to the page. I guess we can situation a bit better for shmem/tmpfs: we can populate such shared mappings with FOLL_WRITE. But this patch is not good for the task. -- Kirill A. Shutemov