Received: by 2002:a89:48b:0:b0:1f5:f2ab:c469 with SMTP id a11csp677902lqd; Wed, 24 Apr 2024 13:31:40 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXQvNhJ4W+J7YkermFkGyRjcHv1+anjfzc72aOKEAvj5j1BFNoZeHNcfGRYEqr+PzReGgn+V4OvWNwJpRx0hppZPRaZoTB0EitXv8ymOg== X-Google-Smtp-Source: AGHT+IH1u3zfLoqtkSoR3M0Fdiozdlq+gMT4U6ycj71zIGFEGuV852UikKWO4+fI1yfi4Uprkt8Z X-Received: by 2002:a17:903:2451:b0:1ea:9585:a07b with SMTP id l17-20020a170903245100b001ea9585a07bmr1879123pls.0.1713990700000; Wed, 24 Apr 2024 13:31:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713990699; cv=pass; d=google.com; s=arc-20160816; b=IOBF9Y4XSNJLDUnHo5WUTHlYNfA1r8sWAiNhdkX5wfDm1TkigaeaTQo7+Nf8jtYm+q u02Dwqz4RBMyQMsydmIQ1Z7GmdWirrydQE9kBpsPMMnjwxYQz1rmZYs86WLCo5Rw5lkw asw2ih0rWVT963tYN7JMpQ1iLlGcHoSK2JRT5tt/0ZDKgAjUkLp0iBfurpbs3Uoa6A90 R1MqY2oTrOsGhe3TbZd99X3A0chWwpK7SXJFusmAoS8WyrfbJQDQJq01xivTs9lBBi8R 6IHVzVDDTsBfmlU5GTeFz2DD40Jd3vZzCbvMnWXSQRgA+qORWDUoA+pT98FOxtpG7+yN 0ilA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=Z+e+2WTr+55fMN46zgpBMyY5FEksh29xdSRSoFD8QA8=; fh=Z/J7P8Vou0pI6Kh5xn7InIClNWVrBRNI113V/vh3vGM=; b=jzpvKigtwlpUrujXhlOjT87byQYHgB+pqWr0HHuf6VIYIHAvscLh6RqAfFmXzVF+e7 H/udbkciws8/5D9UEFRXfc1Js6ll+a2H6EWG3RHS8fWPyXxNyWYvMVzH/HTqz8ZPT3P/ nCprahxfqI+kZr3KmGSfQko/MIRVYYcwwRWiMnZLVyfXAGa9IdzWHDDwgRcrlz2Q1ZyY S76UdcLFzCOMYwxx3yRNd3Mvd7Gj1YZ0rGWjutApTirmS2jCb07L5Ks+PZ+IiwklR7hK Up9tHcuzWC+ILLHkwYJ7/db7HG+F/F1TSf3V6beszq6MuqW841tw2e83Z0fMV4OI4ofz RXkw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=vYv5SZQr; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-157686-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-157686-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id j8-20020a170903024800b001e589a2f9fbsi11783500plh.248.2024.04.24.13.31.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 13:31:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-157686-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=vYv5SZQr; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-157686-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-157686-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5D22F287870 for ; Wed, 24 Apr 2024 20:31:39 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 626C014D293; Wed, 24 Apr 2024 20:31:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vYv5SZQr" Received: from mail-wr1-f54.google.com (mail-wr1-f54.google.com [209.85.221.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFBC614C594 for ; Wed, 24 Apr 2024 20:31:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.54 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713990696; cv=none; b=TBVVUyKbRAzmWlGqKTk7rukCLYsolFyGR3c4dm51Ee3sbUna/noKcXpDic5N4nnPpTV39cPY96fHgCb95krV55dU3eRMbnM0xdCfslgP+pycBMTMLOQ79bk2/0KH8D6v7gYcOl7rBrtqfDHrXMewkXSVS7IDP6AyR38SqX2OJ74= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713990696; c=relaxed/simple; bh=CtKCUQB4Kql7tmaWP27sTCH5+nmqkV3XR81u4Dirqt4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HJOrI7geH68gYNdd4TygrxHT8Ggfp6ntymtMV70nBkl9+6MUkKr4qItzyOwYz435/HQmCC5Juxp0FDA5Da0z2A1465kwo6jLJDIFRAftnwq3XirVat1TxP9cKbQ78OnkK6RJob7shAzhHhbHk/HbiArZN7Cua/y2pOYbhS9u82o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=vYv5SZQr; arc=none smtp.client-ip=209.85.221.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-wr1-f54.google.com with SMTP id ffacd0b85a97d-34b1e35155aso208543f8f.3 for ; Wed, 24 Apr 2024 13:31:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713990693; x=1714595493; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=Z+e+2WTr+55fMN46zgpBMyY5FEksh29xdSRSoFD8QA8=; b=vYv5SZQr4ZmBL4R7NDR/iUpjAIoLvukjMu3HB/xnTTdJcAftLO9uEraM8nj9Eki8XI jvmQownnfIx0EsPiElD5QXKC2jx+8bm/+nugeY/9ZPno6EoysJL+GGcQL5Fai5h0z5Bt 5J8uImkvpqXVhrcvXNEysMb8tlLinyJcr+CirU7vflV8/gB0R6q3UtShTSJKjjdav8uF D6ADHWhIIIbO3Q+FruvDgALEmdGJeDWBzIgc49JlbUN/zSAK+mMS7eG88qHB0wAtOzkx Ml6rTYSkunFa4ke52Fqn78iPR7AZpmQbmWQq5XLmNYKPyxIqrwITyLkdJMewQH3+MAUk +oGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713990693; x=1714595493; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Z+e+2WTr+55fMN46zgpBMyY5FEksh29xdSRSoFD8QA8=; b=Aa/Jg26K81QKA3zC9p+Qi4m567DZQ6w0+8z4MBICFyx+28OQnhakA6n4e4H4DewNxP UdM72mxAzCl7PHc6a+RX08XEBImk7mNzLK/cE5GDz2VYAAHZ4BwfpELSZynrjw+5sHtW G6TBlb3OVKdjp+DuXNyZv2G2Xp20ygzdWMccTWoi4bOpUkLTb+Jl8TODlPXyHdJAbBvg 7zLlnS62suw8ZNiBK7g0ApCd/ekieIBp77gzKsMHyYSSFoUaWCU8o6nJ7t3BRINRMqLu nz2aphIf5zFkQXGrzIj9QuwfchUE9+LaZUMwV+mD15PP078tyxmkNt7Uir+TZL255st/ iaWA== X-Forwarded-Encrypted: i=1; AJvYcCW/wOmy/UCd+uMbI+McFDyhapg1lgEMICra6FNf9tfhy6X+Mmp9OjfGTPAVKHqIUjhrZLy04y/mdDZn0aUQJYrtH8/2ijv7pDYnf0ya X-Gm-Message-State: AOJu0Yz5X0Ojcu3AOuqHnca+Jei+Sij6bQvV8GAtWQqh19hSLS0Nj/aP tdLpRzK2yof1ElsGLmV4ZE3OFDLSMoYXTd6VQS9dZ3cg3TcgtGUwgDgrb70t0g== X-Received: by 2002:a05:600c:3b26:b0:418:a706:3209 with SMTP id m38-20020a05600c3b2600b00418a7063209mr3509802wms.31.1713990693060; Wed, 24 Apr 2024 13:31:33 -0700 (PDT) Received: from google.com (88.140.78.34.bc.googleusercontent.com. [34.78.140.88]) by smtp.gmail.com with ESMTPSA id v18-20020a5d43d2000000b0034a25339e47sm17107793wrr.69.2024.04.24.13.31.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Apr 2024 13:31:32 -0700 (PDT) Date: Wed, 24 Apr 2024 21:31:28 +0100 From: Vincent Donnefort To: David Hildenbrand Cc: rostedt@goodmis.org, mhiramat@kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, kernel-team@android.com, rdunlap@infradead.org, rppt@kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v21 2/5] ring-buffer: Introducing ring-buffer mapping functions Message-ID: References: <20240423232728.1492340-1-vdonnefort@google.com> <20240423232728.1492340-3-vdonnefort@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Hi David, Thanks for your quick response. On Wed, Apr 24, 2024 at 05:26:39PM +0200, David Hildenbrand wrote: > > I gave it some more thought, and I think we are still missing something (I > wish PFNMAP/MIXEDMAP wouldn't be that hard). > > > + > > +/* > > + * +--------------+ pgoff == 0 > > + * | meta page | > > + * +--------------+ pgoff == 1 > > + * | subbuffer 0 | > > + * | | > > + * +--------------+ pgoff == (1 + (1 << subbuf_order)) > > + * | subbuffer 1 | > > + * | | > > + * ... > > + */ > > +#ifdef CONFIG_MMU > > +static int __rb_map_vma(struct ring_buffer_per_cpu *cpu_buffer, > > + struct vm_area_struct *vma) > > +{ > > + unsigned long nr_subbufs, nr_pages, vma_pages, pgoff = vma->vm_pgoff; > > + unsigned int subbuf_pages, subbuf_order; > > + struct page **pages; > > + int p = 0, s = 0; > > + int err; > > + > > I'd add some comments here like > > /* Refuse any MAP_PRIVATE or writable mappings. */ > > + if (vma->vm_flags & VM_WRITE || vma->vm_flags & VM_EXEC || > > + !(vma->vm_flags & VM_MAYSHARE)) > > + return -EPERM; > > + > > /* > * Make sure the mapping cannot become writable later. Also, tell the VM > * to not touch these pages pages (VM_DONTCOPY | VM_DONTDUMP) and tell > * GUP to leave them alone as well (VM_IO). > */ > > + vm_flags_mod(vma, > > + VM_MIXEDMAP | VM_PFNMAP | > > + VM_DONTCOPY | VM_DONTDUMP | VM_DONTEXPAND | VM_IO, > > + VM_MAYWRITE); > > I am still really unsure about VM_PFNMAP ... it's not a PFNMAP at all and, > as stated, vm_insert_pages() even complains quite a lot when it would have > to set VM_MIXEDMAP and VM_PFNMAP is already set, likely for a very good > reason. > > Can't we limit ourselves to VM_IO? > > But then, I wonder if it really helps much regarding GUP: yes, it blocks > ordinary GUP (see check_vma_flags()) but as insert_page_into_pte_locked() > does *not* set pte_special(), GUP-fast (gup_fast_pte_range()) will not > reject it. > > Really, if you want GUP-fast to reject it, remap_pfn_range() and friends are > the way to go, that will set pte_special() such that also GUP-fast will > leave it alone, just like vm_normal_page() would. > > So ... I know Linus recommended VM_PFNMAP/VM_IO to stop GUP, but it alone > won't stop all of GUP. We really have to mark the PTE as special, which > vm_insert_page() must not do (because it is refcounted!). Hum, apologies, I am not sure to follow the connection here. Why do you think the recommendation was to prevent GUP? > > Which means: do we really have to stop GUP from grabbing that page? > > Using vm_insert_page() only with VM_MIXEDMAP (and without VM_PFNMAP|VM_IO) > would be better. Under the assumption we do not want to stop all GUP, why not using VM_IO over VM_MIXEDMAP which is I believe more restrictive? > > If we want to stop all of GUP, remap_pfn_range() currently seems unavoidable > :( > > > > + > > + lockdep_assert_held(&cpu_buffer->mapping_lock); > > + > > + subbuf_order = cpu_buffer->buffer->subbuf_order; > > + subbuf_pages = 1 << subbuf_order; > > + > > + nr_subbufs = cpu_buffer->nr_pages + 1; /* + reader-subbuf */ > > + nr_pages = ((nr_subbufs) << subbuf_order) - pgoff + 1; /* + meta-page */ > > + > > + vma_pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; > > + if (!vma_pages || vma_pages > nr_pages) > > + return -EINVAL; > > + > > + nr_pages = vma_pages; > > + > > + pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL); > > + if (!pages) > > + return -ENOMEM; > > + > > + if (!pgoff) { > > + pages[p++] = virt_to_page(cpu_buffer->meta_page); > > + > > + /* > > + * TODO: Align sub-buffers on their size, once > > + * vm_insert_pages() supports the zero-page. > > + */ > > + } else { > > + /* Skip the meta-page */ > > + pgoff--; > > + > > + if (pgoff % subbuf_pages) { > > + err = -EINVAL; > > + goto out; > > + } > > + > > + s += pgoff / subbuf_pages; > > + } > > + > > + while (s < nr_subbufs && p < nr_pages) { > > + struct page *page = virt_to_page(cpu_buffer->subbuf_ids[s]); > > + int off = 0; > > + > > + for (; off < (1 << (subbuf_order)); off++, page++) { > > + if (p >= nr_pages) > > + break; > > + > > + pages[p++] = page; > > + } > > + s++; > > + } > > + > > + err = vm_insert_pages(vma, vma->vm_start, pages, &nr_pages); > > vm_insert_pages() documents: "In case of error, we may have mapped a subset > of the provided pages. It is the caller's responsibility to account for this > case." > > Which could for example happen, when allocating a page table fails. > > Would we able to deal with that here? As we are in the mmap path, on an error, I would expect the vma to be destroyed and those pages whom insertion succeeded to be unmapped? But perhaps shall we proactively zap_page_range_single()? > > > Again, I wish it would all be easier ... > > -- > Cheers, > > David / dhildenb > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >