Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753163AbdF0Psh (ORCPT ); Tue, 27 Jun 2017 11:48:37 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:21831 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753026AbdF0PsT (ORCPT ); Tue, 27 Jun 2017 11:48:19 -0400 Subject: Re: [RFC PATCH] userfaultfd: Add feature to request for a signal delivery To: Michal Hocko References: <9363561f-a9cd-7ab6-9c11-ab9a99dc89f1@oracle.com> <20170627070643.GA28078@dhcp22.suse.cz> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrea Arcangeli , Mike Rapoport , Mike Kravetz , Dave Hansen , Christoph Hellwig From: Prakash Sangappa Message-ID: <46792166-898b-47b7-ccd1-e128511b21ee@oracle.com> Date: Tue, 27 Jun 2017 08:47:38 -0700 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170627070643.GA28078@dhcp22.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2738 Lines: 61 On 6/27/17 12:06 AM, Michal Hocko wrote: > This is an user visible API so let's CC linux-api mailing list. > > On Mon 26-06-17 12:46:13, Prakash Sangappa wrote: >> In some cases, userfaultfd mechanism should just deliver a SIGBUS signal >> to the faulting process, instead of the page-fault event. Dealing with >> page-fault event using a monitor thread can be an overhead in these >> cases. For example applications like the database could use the signaling >> mechanism for robustness purpose. > this is rather confusing. What is the reason that the monitor would be > slower than signal delivery and handling? There are a large number of single threaded database processes involved, each of these processes will require a monitor thread which is considered an overhead. > >> Database uses hugetlbfs for performance reason. Files on hugetlbfs >> filesystem are created and huge pages allocated using fallocate() API. >> Pages are deallocated/freed using fallocate() hole punching support. >> These files are mmapped and accessed by many processes as shared memory. >> The database keeps track of which offsets in the hugetlbfs file have >> pages allocated. >> >> Any access to mapped address over holes in the file, which can occur due >> to bugs in the application, is considered invalid and expect the process >> to simply receive a SIGBUS. However, currently when a hole in the file is >> accessed via the mapped address, kernel/mm attempts to automatically >> allocate a page at page fault time, resulting in implicitly filling the >> hole in the file. This may not be the desired behavior for applications >> like the database that want to explicitly manage page allocations of >> hugetlbfs files. > So you register UFFD_FEATURE_SIGBUS on each region tha you are unmapping > and than just let those offenders die? The database application will create the mapping and register with userfault. Subsequently when the processes the mapping over a hole will result in SIGBUS and die. > >> Using userfaultfd mechanism, with this support to get a signal, database >> application can prevent pages from being allocated implicitly when >> processes access mapped address over holes in the file. >> >> This patch adds the feature to request for a SIGBUS signal to userfaultfd >> mechanism. >> >> See following for previous discussion about the database requirement >> leading to this proposal as suggested by Andrea. >> >> http://www.spinics.net/lists/linux-mm/msg129224.html > Please make those requirements part of the changelog. The requirement is described above, which is the need for the database application to not fill hole implicitly. Sorry, if this was not clear. I will update the change log and send a v2 patch again.