Received: by 2002:a25:86ce:0:0:0:0:0 with SMTP id y14csp1386753ybm; Wed, 22 May 2019 23:38:23 -0700 (PDT) X-Google-Smtp-Source: APXvYqwot72u6OAsmalbT46PhEPhkFmSp6x+YwiIY3O5uLKINn8KftoCztORXPn0E4yGUFf2XTIh X-Received: by 2002:a62:2c17:: with SMTP id s23mr71104347pfs.51.1558593503359; Wed, 22 May 2019 23:38:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558593503; cv=none; d=google.com; s=arc-20160816; b=OYTEwwU0joGuAfvUqJLJfHXvAJxG0VJiPvzRdgo9Svw+5HCAVZb+a+a1BSiwxSCgp/ lFJ/3y0rgixpqhsnqnA8PCEqpoaR1eSJUGXMnNmUDWTr6h4SQHhv5QqEGmTkpqdxPnYt A467bHHAi9QN7YEWv/VjsAbasmi2wVraVeqST0zqLdZOc/R80Etmoarh5taK6XXWoGw+ Q2/doKR0qJiBY95NRlXZZoA6VFKYZt3wV70f6UqHHz8xj3Dn2sNHQoQEw4jQ5IlO1cG5 CUpAc08iTjNlABSgR/6nxICJiot/aTiGYlq7Och7UZpKIKx/OaaIJC8ieZ1LDVubshcT UMZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=DMgMenbiYkWeMSHYgllY0KX56kuCMPb7MQ3jm5E4nqE=; b=lpm8l/fGPbDaYaVmFMP2EqFANgvEthQbmhAktNfiBIzW1kdNDzty3/BuuuN/CVQdKy 3bhMryyvZvct+05aSicymCiXtGzaR45yM6qALjPHFavbAli4dRh6Lc2NXNtRKiWLww/G CtUlap0e7wEj+shB3t+G7XDnV/l1IdHXe+X5u6Mff6GXfFuCa6x76ix/zy93RZqB9kKI pTJdYBSWh+YyVwm74+GwGdrxG/QfeW0FC+OigpJWLncBGJifalKlvQKnZEC91rfpa1z6 FMG75kTuMlrg+zufSKVS02mGQtGR/5IhSpY6j4w0Ova7XYfywLux8MWJBB/1kTIl5p3j jrQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=Wpc8uOrT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a17si27976994pgb.25.2019.05.22.23.37.36; Wed, 22 May 2019 23:38:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=Wpc8uOrT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726440AbfEWGdv (ORCPT + 99 others); Thu, 23 May 2019 02:33:51 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:16451 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725814AbfEWGdv (ORCPT ); Thu, 23 May 2019 02:33:51 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 22 May 2019 23:33:51 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Wed, 22 May 2019 23:33:50 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Wed, 22 May 2019 23:33:50 -0700 Received: from [10.2.169.219] (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 23 May 2019 06:33:49 +0000 Subject: Re: [PATCH 5/5] mm/hmm: Fix mm stale reference use in hmm_free() To: Jason Gunthorpe , Ralph Campbell CC: , , Ira Weiny , Dan Williams , Arnd Bergmann , Balbir Singh , Dan Carpenter , Matthew Wilcox , Souptick Joarder , Andrew Morton References: <20190506233514.12795-1-rcampbell@nvidia.com> <20190522233628.GA16137@ziepe.ca> <2938d2da-424d-786e-5486-1e4fa9f58425@nvidia.com> <20190523012504.GG15389@ziepe.ca> From: John Hubbard X-Nvconfidentiality: public Message-ID: <0994464b-8e59-c227-0b67-00ddd9c83943@nvidia.com> Date: Wed, 22 May 2019 23:32:53 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190523012504.GG15389@ziepe.ca> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL104.nvidia.com (172.18.146.11) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1558593231; bh=DMgMenbiYkWeMSHYgllY0KX56kuCMPb7MQ3jm5E4nqE=; h=X-PGP-Universal:Subject:To:CC:References:From:X-Nvconfidentiality: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=Wpc8uOrTOwt5LsR4mYCEpKSR3W18iqjaOJShap9QBNSQwpiwqktG7gS+nRRAeCNTI z51b+X+xRvcEzjdH497mx/qR//Y5p2pxbCAUnuQiDckr0CMcMgzIm321CP0j4qMwrK vTaVKBHknoV3n93dwec2nIJ24KUbd4FYrsXcHTeYrFFLmdecMoeWl+ysPIYGwwuf6p x7rc+PPigxXuBOWrXD4CtYYYzoj3C1a9gDQpfAgRx/XWDxCfJJnEeSbxeKZIPaiA7N lw9Ipfg+lCZqyxljJso/m4GFf3DalwjB2MR7RjPXqIBG9bqKgjuWm0gwLn7y5T3TYB P1hLkZoEPSR/g== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/22/19 6:25 PM, Jason Gunthorpe wrote: > On Wed, May 22, 2019 at 05:54:17PM -0700, Ralph Campbell wrote: >> >> On 5/22/19 4:36 PM, Jason Gunthorpe wrote: >>> On Mon, May 06, 2019 at 04:35:14PM -0700, rcampbell@nvidia.com wrote: >>>> From: Ralph Campbell >>>> >>>> The last reference to struct hmm may be released long after the mm_struct >>>> is destroyed because the struct hmm_mirror memory may be part of a >>>> device driver open file private data pointer. The file descriptor close >>>> is usually after the mm_struct is destroyed in do_exit(). This is a good >>>> reason for making struct hmm a kref_t object [1] since its lifetime spans >>>> the life time of mm_struct and struct hmm_mirror. >>> >>>> The fix is to not use hmm->mm in hmm_free() and to clear mm->hmm and >>>> hmm->mm pointers in hmm_destroy() when the mm_struct is >>>> destroyed. >>> >>> I think the right way to fix this is to have the struct hmm hold a >>> mmgrab() on the mm so its memory cannot go away until all of the hmm >>> users release the struct hmm, hmm_ranges/etc >>> >>> Then we can properly use mmget_not_zero() instead of the racy/abnormal >>> 'if (hmm->xmm == NULL || hmm->dead)' pattern (see the other >>> thread). Actually looking at this, all these tests look very >>> questionable. If we hold the mmget() for the duration of the range >>> object, as Jerome suggested, then they all get deleted. >>> >>> That just leaves mmu_notifier_unregister_no_relase() as the remaining >>> user of hmm->mm (everyone else is trying to do range->mm) - and it >>> looks like it currently tries to call >>> mmu_notifier_unregister_no_release on a NULL hmm->mm and crashes :( >>> >>> Holding the mmgrab fixes this as we can safely call >>> mmu_notifier_unregister_no_relase() post exit_mmap on a grab'd mm. >>> >>> Also we can delete the hmm_mm_destroy() intrustion into fork.c as it >>> can't be called when the mmgrab is active. >>> >>> This is the basic pattern we used in ODP when working with mmu >>> notifiers, I don't know why hmm would need to be different. +1 for the mmgrab() approach. I have never been able to see how these various checks can protect anything, and refcounting it into place definitely sounds like the right answer. thanks, -- John Hubbard NVIDIA