Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp716178ybi; Fri, 31 May 2019 07:56:59 -0700 (PDT) X-Google-Smtp-Source: APXvYqyc/BW+wh6+2wFwFuf4hZ2WV73TUjGvca+YVN5ig7vpp5w4+rQej7oMSk5nVtJQpPspqMkl X-Received: by 2002:a65:624f:: with SMTP id q15mr9655884pgv.436.1559314619483; Fri, 31 May 2019 07:56:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559314619; cv=none; d=google.com; s=arc-20160816; b=KhJhSmDvbSzfmUG8l6OeSMkHJqbJWc7pQusBCXNDLVeiYoH7V+QjEFDwiKI7Gwgz8Z VsD+8d9I9BM/IwIZzIFLlG7Np5AE5zzP+lI62R+vB65wFdkWd0cBP5amqsqCQhFoIdpk /qpSA0txjSwmccSobb73sor0JdKxVYQwb2Y/hJRSfU7YHfJ74EDnksKCaLUmI3agyt2l Kj9dQL5UibY3S23YjUQ4b9LNC4B/lHpuisIuBQiPk4btMvBT1yoIWGVhDAblkZLIJo3o I3ytcZcyf09G/BrThPl9OKfu0fhvyUCP4Lw/7DQWlHw0+Rt4wE5ddzdcJUAc+Im8jlta fbQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:content-id:mime-version :subject:cc:to:references:in-reply-to:from:organization; bh=SsCstiaUJ8oa/+ezd4apLX2h7WaNv+e+TaRDUCyZUsg=; b=juWtthgB4jaEbvUoWeaCfKnqbURRdguQAznCDta2P0/EKdN50rJ4g84nn9euKrqOR8 AEYz7J1we57yjsH27U5XOix8Hst/Lh7E41JwbpOTdty2vKH+gg3K0SXDt2741/L4pjDG Rl0kjtDeC25N+ktOg61zN2oToqEqyivyUcG5TE3hJB/fhetivf71kMSxi60MXt510oaL HQuhaZhYjAnnZ79QL1r8B9x4dS8vG2AVyZ4FLBegxqf7Q0AzDp/DQl7Ep+NNBozI+YvG Tm42zydQqEYOYgbSgCuz9aj4fnDiLTLYhVi7HTZYf0PRfJLb/YkrVbpFzoqfIpfbdbc7 Ks+w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s18si3432086pfh.210.2019.05.31.07.56.43; Fri, 31 May 2019 07:56:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726819AbfEaOzN (ORCPT + 99 others); Fri, 31 May 2019 10:55:13 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40650 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726531AbfEaOzM (ORCPT ); Fri, 31 May 2019 10:55:12 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 47CAFC0AD2B7; Fri, 31 May 2019 14:55:12 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-120-173.rdu2.redhat.com [10.10.120.173]) by smtp.corp.redhat.com (Postfix) with ESMTP id 44FF41001E6F; Fri, 31 May 2019 14:55:09 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <20190529231112.GB3164@kroah.com> References: <20190529231112.GB3164@kroah.com> <20190528231218.GA28384@kroah.com> <20190528162603.GA24097@kroah.com> <155905930702.7587.7100265859075976147.stgit@warthog.procyon.org.uk> <155905931502.7587.11705449537368497489.stgit@warthog.procyon.org.uk> <4031.1559064620@warthog.procyon.org.uk> <31936.1559146000@warthog.procyon.org.uk> To: Greg KH Cc: dhowells@redhat.com, viro@zeniv.linux.org.uk, raven@themaw.net, linux-fsdevel@vger.kernel.org, linux-api@vger.kernel.org, linux-block@vger.kernel.org, keyrings@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/7] General notification queue with user mmap()'able ring buffer MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3762.1559314508.1@warthog.procyon.org.uk> Date: Fri, 31 May 2019 15:55:08 +0100 Message-ID: <3763.1559314508@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 31 May 2019 14:55:12 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Greg KH wrote: > So, if that's all that needs to be fixed, can you use the same > buffer/code if that patch is merged? I really don't know. The perf code is complex, partially in hardware drivers and is tricky to understand - though a chunk of that is the "aux" buffer part; PeterZ used words like "special" and "magic" and the comments in the code talk about the hardware writing into the buffer. __perf_output_begin() does not appear to be SMP safe. It uses local_cmpxchg() and local_add() which on x86 lack the LOCK prefix. stracing the perf command on my test machine, it calls perf_event_open(2) four times and mmap's each fd it gets back. I'm guessing that each one maps a separate buffer for each CPU. So to use watch_queue based on perf's buffering, you would have to have a (2^N)+1 pages-sized buffer for each CPU. So that would be a minimum of 64K of unswappable memory for my desktop machine, say). Multiply that by each process that wants to listen for events... What I'm aiming for is something that has a single buffer used by all CPUs for each instance of /dev/watch_queue opened and I'd also like to avoid having to allocate the metadata page and the aux buffer to save space. This is locked memory and cannot be swapped. Also, perf has to leave a gap in the ring because it uses CIRC_SPACE(), though that's a minor detail that I guess can't be fixed now. I'm also slightly concerned that __perf_output_begin() doesn't check if rb->user->tail has got ahead of rb->user->head or that it's lagging too far behind. I doubt it's a serious problem for the kernel since it won't write outside of the buffer, but userspace might screw up. I think the worst that will happen is that userspace will get confused. One thing I would like is to waive the 2^N size requirement. I understand *why* we do that, but I wonder how expensive DIV instructions are for relatively small divisors. David