Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp1967879pxb; Sun, 17 Oct 2021 02:18:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwZnOYuQEyk+/c+XzSU6pnIQct2sFEe9nYSCDTuLjiF76CXhKex13GtOCZgSRoa0pdhWBS2 X-Received: by 2002:a05:6a00:992:b0:44d:8981:37f6 with SMTP id u18-20020a056a00099200b0044d898137f6mr17568953pfg.76.1634462330472; Sun, 17 Oct 2021 02:18:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634462330; cv=none; d=google.com; s=arc-20160816; b=HCDAvk0PqxzzHN2csuQZrUaTXkU2rDwAyOBXkeAZAWcLKIK3idBYKQThgFVQKfDZFO bfHBm4EPzMJ1sG+HKdqXbfjOOsIRFLa7KRjkusxgniIq9miuOW96QGCFz4H4Icu2W+wF K1yJqm+N+x3bBvc6DTh0bx6uifot+DhOO0MrqA+ITOhwRs1QLOvugs52jsq4Pec2FI5I em0SV5qS4qn4fsteFHKfidtNvVFQZuaZAeeqtPS7xwCuh2r/7oQHE9HQhWVrfPvtq0xQ awfHsplwXKvoEd+nXTqp/n1JMq03DQPM2RZOc1zEaR/Hi8gNjHvfbNv8P9+Pzv6uMUPE pL3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to :organization:from:references:cc:to:content-language:subject :user-agent:mime-version:date:message-id:dkim-signature; bh=E4LKFDRNGrYewkIQEu16Zm63HJjl+j0dZzAICvUOLVM=; b=LkuGvxu0fHflKtSMuFuWRCLqVUpZJ92NRy7vjJn5KWwN86DMFglYL0AFKakc8raRdQ 5d+x5b9boalTup9DWRbbUWO6J8ZXO9cKJ5yF8JKHVvnOWwoYrgYwvAPbDdlTtftWbABf 8hrI+M1XX7jbCrBgo1pzy9GyafdqmV8bRl4oimvR522PNRzJOra2/fLjPKUxLViOvMLS 6uGB3dtyemofpNYUKqfbUcVNkmi40ZXpxwMXF9+LICFcuquWK5EZSOtiBi53TkIo2X1M czy8oI37W/aXde/++/U3ze5DtwiYaqh5L5BvaxWGQuU39assuJ48PgV05xfQmHeJ/FSU qrHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CBjO6tr8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g187si14885103pgc.477.2021.10.17.02.18.11; Sun, 17 Oct 2021 02:18:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=CBjO6tr8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237382AbhJOQmD (ORCPT + 99 others); Fri, 15 Oct 2021 12:42:03 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:29791 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237414AbhJOQmB (ORCPT ); Fri, 15 Oct 2021 12:42:01 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1634315994; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=E4LKFDRNGrYewkIQEu16Zm63HJjl+j0dZzAICvUOLVM=; b=CBjO6tr8fYwuZhZFO0HMUscuN+Os6YFooRl+XbVGo1YwhMnqK4qyd/tBdoBeT5Vz/liFGw 3UCz8CnXIKKK4000l1v0BtZkJ47ClqUk8RHlRFpMXGbIjH+C/BuEsFhHKSDRGiCfZxv5P4 D9sQ6E8QbDiU1PvFVi2gFIrsJL7wl7U= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-395--waiP4VYOwOS_mUfnNNLKA-1; Fri, 15 Oct 2021 12:39:53 -0400 X-MC-Unique: -waiP4VYOwOS_mUfnNNLKA-1 Received: by mail-wr1-f69.google.com with SMTP id r16-20020adfbb10000000b00160958ed8acso6120036wrg.16 for ; Fri, 15 Oct 2021 09:39:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=E4LKFDRNGrYewkIQEu16Zm63HJjl+j0dZzAICvUOLVM=; b=AnBk/vnTMoTCZrZKv1CtiwsfUKKkM3FxMPvXBA62/tKD6FMFcYUlbhPu2glGBA8N0X uF2ZCt0G2cB1iNXt9IpdN6vnenunznoWiXwmhNEvwrPld574J5m3fPbRyEZrlb/wIuyW azi1tlapZrPQYSvcmr4IcPKj+L51TFQPg14Y8mChNV8Fp91ASx60H8gI0YSw5faPg2lY +6bhMkBoDLEtxw6jR9sZjfgnwvmpdhZbAX4J1Ae/xoAvPCUouVGOdOnMabaK1MR0UFC4 +4gehBb4Zo3N10rAjehCrnxDmOiIXo2vvcSL09SqutdB7Qu7sdQwhq5RvmA5CNJYHUng f+QA== X-Gm-Message-State: AOAM533V4+3+MU2vZZg6ByZ1CVlHnoa69c1tzHE3iHxtZ4XDKY6bCuqi 4apNDBJA1ZhNf34o/W8+3Aw5/MSvMon6RwIibxjUaMD79sNZlFRKh2XBg2P7/aVE8iKzEwrnTrd 4LzZexgdv/WUC6NVSz3lXbrj9 X-Received: by 2002:a1c:2b85:: with SMTP id r127mr27224256wmr.134.1634315992102; Fri, 15 Oct 2021 09:39:52 -0700 (PDT) X-Received: by 2002:a1c:2b85:: with SMTP id r127mr27224197wmr.134.1634315991799; Fri, 15 Oct 2021 09:39:51 -0700 (PDT) Received: from [192.168.3.132] (p5b0c6a01.dip0.t-ipconnect.de. [91.12.106.1]) by smtp.gmail.com with ESMTPSA id y8sm4956776wmi.43.2021.10.15.09.39.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 15 Oct 2021 09:39:51 -0700 (PDT) Message-ID: <3563a3e8-b971-b604-7388-766ecfce4634@redhat.com> Date: Fri, 15 Oct 2021 18:39:49 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Subject: Re: [PATCH v10 3/3] mm: add anonymous vma name refcounting Content-Language: en-US To: Suren Baghdasaryan Cc: Michal Hocko , Kees Cook , Pavel Machek , Rasmus Villemoes , John Hubbard , Andrew Morton , Colin Cross , Sumit Semwal , Dave Hansen , Matthew Wilcox , "Kirill A . Shutemov" , Vlastimil Babka , Johannes Weiner , Jonathan Corbet , Al Viro , Randy Dunlap , Kalesh Singh , Peter Xu , rppt@kernel.org, Peter Zijlstra , Catalin Marinas , vincenzo.frascino@arm.com, =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , Axel Rasmussen , Andrea Arcangeli , Jann Horn , apopple@nvidia.com, Yu Zhao , Will Deacon , fenghua.yu@intel.com, thunder.leizhen@huawei.com, Hugh Dickins , feng.tang@intel.com, Jason Gunthorpe , Roman Gushchin , Thomas Gleixner , krisman@collabora.com, Chris Hyser , Peter Collingbourne , "Eric W. Biederman" , Jens Axboe , legion@kernel.org, Rolf Eike Beer , Cyrill Gorcunov , Muchun Song , Viresh Kumar , Thomas Cedeno , sashal@kernel.org, cxfcosmos@gmail.com, LKML , linux-fsdevel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm , kernel-team References: <92cbfe3b-f3d1-a8e1-7eb9-bab735e782f6@rasmusvillemoes.dk> <20211007101527.GA26288@duo.ucw.cz> <202110071111.DF87B4EE3@keescook> <202110081344.FE6A7A82@keescook> <26f9db1e-69e9-1a54-6d49-45c0c180067c@redhat.com> From: David Hildenbrand Organization: Red Hat In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org >>> >>> 1. Forking a process with anonymous vmas named using memfd is 5-15% >>> slower than with prctl (depends on the number of VMAs in the process >>> being forked). Profiling shows that i_mmap_lock_write() dominates >>> dup_mmap(). Exit path is also slower by roughly 9% with >>> free_pgtables() and fput() dominating exit_mmap(). Fork performance is >>> important for Android because almost all processes are forked from >>> zygote, therefore this limitation already makes this approach >>> prohibitive. >> >> Interesting, naturally I wonder if that can be optimized. > > Maybe but it looks like we simply do additional things for file-backed > memory, which seems natural. The call to i_mmap_lock_write() is from > here: https://elixir.bootlin.com/linux/latest/source/kernel/fork.c#L565 > >> >>> >>> 2. mremap() usage to grow the mapping has an issue when used with memfds: >>> >>> fd = memfd_create(name, MFD_ALLOW_SEALING); >>> ftruncate(fd, size_bytes); >>> ptr = mmap(NULL, size_bytes, prot, MAP_PRIVATE, fd, 0); >>> close(fd); >>> ptr = mremap(ptr, size_bytes, size_bytes * 2, MREMAP_MAYMOVE); >>> touch_mem(ptr, size_bytes * 2); >>> >>> This would generate a SIGBUS in touch_mem(). I believe it's because >>> ftruncate() specified the size to be size_bytes and we are accessing >>> more than that after remapping. prctl() does not have this limitation >>> and we do have a usecase for growing a named VMA. >> >> Can't you simply size the memfd much larger? I mean, it doesn't really >> cost much, does it? > > If we know beforehand what the max size it can reach then that would > be possible. I would really hate to miscalculate here and cause a > simple memory access to generate signals. Tracking such corner cases > in the field is not an easy task and I would rather avoid the > possibility of it. The question would be if you cannot simply add some extremely large number, because the file size itself doesn't really matter for memfd IIRC. Having that said, without trying it out, I wouldn't know from the top of my head if memremap would work that way on an already closed fd that ahs a sufficient size :/ If you have the example still somewhere, I would be interested if that would work in general. [...] >> >>> >>> 4. There is a usecase in the Android userspace where vma naming >>> happens after memory was allocated. Bionic linker does in-memory >>> relocations and then names some relocated sections. >> >> Would renaming a memfd be an option or is that "too late" ? > > My understanding is that linker allocates space to load and relocate > the code, performs the relocations in that space and then names some > of the regions after that. Whether it can be redesigned to allocate > multiple named regions and perform the relocation between them I did > not really try since it would be a project by itself. > > TBH, at some point I just look at the amount of required changes (both > kernel and userspace) and new limitations that userspace has to adhere > to for fitting memfds to my usecase, and I feel that it's just not > worth it. In the end we end up using the same refcounted strings with > vma->vm_file->f_count as the refcount and name stored in > vma->vm_file->f_path->dentry but with more overhead. Yes, but it's glued to files which naturally have names :) Again, I appreciate that you looked into alternatives! I can see the late renaming could be the biggest blocker if user space cannot be adjusted easily to be compatible with that using memfds. -- Thanks, David / dhildenb