Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp6600795ybi; Wed, 31 Jul 2019 17:47:57 -0700 (PDT) X-Google-Smtp-Source: APXvYqyACpmxgWNXskLk5e79zDiU9SUBHY9p0SlHNfndapllHNzhwrk/KUDW0/Dp2MhGBClhnTGu X-Received: by 2002:a65:52c5:: with SMTP id z5mr102290080pgp.118.1564620477751; Wed, 31 Jul 2019 17:47:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564620477; cv=none; d=google.com; s=arc-20160816; b=fZe5tt1dJXMtpxxSrnyJIx375DSfafNPgQPe28banZBeUbGy/TFdsU4m62Fdn4W7uM eTiiVLCIHZIDu8LqQ8jNDJhd9sHyaKHsAyo+7/SO5+SHSenI4fKp42DeijQcXvtodTPb sKknO7450TaUBOmtHjKdCgvZSS/1jYYpaCphWcr9TE/u45W5LpZ9TzqS53wEcFWYj91m HQhWBuYRKOkNdd/vutQNwZY5mL09jctRbg9nBJv83iCVuoYERIw/FqHXXs3siVbNtA+V i8eyX/Dg4GjYuPsMrV8IYT/gU+sql3xvkJVfKfCfgZ84J9y7+3raUivj0Y8I6EQSBLZt 3tCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=lFLGYzIs86Jr04ESQk3Cj6S6gfRMeyghsL2qZzfMtNc=; b=KraRK92kasWrWotGDvUuqBGJ28QTrqeErxWlgM2rRhIi+qrW/2Ikuq62JAZBHMDuhl lAY1hj2kl+IMrffH2eR54yXxUtQ7SAtOqKI1upc/xyasGhHs3fkVUVmqJFWn4lTNfrGZ ZCCjF+8LxvTGeJshZcY7fWns5QelQKSlUMeLgpJyRpJQU+a/6Jp0YI4arKQuzrN/cEGW FFgDjVY7458LIVJVHz3ARky2VnmJlID0TfoEn7wPqTeLQCiXEFtyUdzDUQRX7//xbVTR yWT6xQKpvyHzgck3xFJC+G1RebiIlbnwjZNUATbyp+U0eZW9WlVEs0chDqaYyyzzoVjY eERQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t190si31869868pgd.465.2019.07.31.17.47.42; Wed, 31 Jul 2019 17:47:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728740AbfHAApn (ORCPT + 99 others); Wed, 31 Jul 2019 20:45:43 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33444 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726118AbfHAApn (ORCPT ); Wed, 31 Jul 2019 20:45:43 -0400 Received: from pd9ef1cb8.dip0.t-ipconnect.de ([217.239.28.184] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hszDv-0001p4-9m; Thu, 01 Aug 2019 02:45:39 +0200 Date: Thu, 1 Aug 2019 02:45:38 +0200 (CEST) From: Thomas Gleixner To: Gabriel Krisman Bertazi cc: mingo@redhat.com, peterz@infradead.org, dvhart@infradead.org, linux-kernel@vger.kernel.org, kernel@collabora.com, Zebediah Figura , Steven Noonan , "Pierre-Loup A . Griffais" Subject: Re: [PATCH RFC 2/2] futex: Implement mechanism to wait on any of several futexes In-Reply-To: <20190730220602.28781-2-krisman@collabora.com> Message-ID: References: <20190730220602.28781-1-krisman@collabora.com> <20190730220602.28781-2-krisman@collabora.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 Jul 2019, Gabriel Krisman Bertazi wrote: > + retry: > + for (i = 0; i < count; i++) { > + qs[i].key = FUTEX_KEY_INIT; > + qs[i].bitset = wb[i].bitset; > + > + ret = get_futex_key(wb[i].uaddr, flags & FLAGS_SHARED, > + &qs[i].key, FUTEX_READ); > + if (unlikely(ret != 0)) { > + for (--i; i >= 0; i--) > + put_futex_key(&qs[i].key); > + goto out; > + } > + } > + > + set_current_state(TASK_INTERRUPTIBLE); > + > + for (i = 0; i < count; i++) { > + ret = __futex_wait_setup(wb[i].uaddr, wb[i].val, > + flags, &qs[i], &hb); > + if (ret) { > + /* Drop the failed key directly. keys 0..(i-1) > + * will be put by unqueue_me. > + */ > + put_futex_key(&qs[i].key); > + > + /* Undo the partial work we did. */ > + for (--i; i >= 0; i--) > + unqueue_me(&qs[i]); That lacks a comment why this does not evaluate the return value of unqueue_me(). If one of the already queued futexes got woken, then it's debatable whether that counts as success or not. Whatever the decision is it needs to be documented instead of documenting the obvious. > + __set_current_state(TASK_RUNNING); > + if (ret > 0) > + goto retry; > + goto out; > + } > + > + /* We can't hold to the bucket lock when dealing with > + * the next futex. Queue ourselves now so we can unlock > + * it before moving on. > + */ > + queue_me(&qs[i], hb); > + } Why does this use two loops and two cleanup loops instead of doing it in a single one? > + > + /* There is no easy to way to check if we are wake already on > + * multiple futexes without waking through each one of them. So waking? > + ret = -ERESTARTSYS; > + if (!abs_time) > + goto out; > + > + ret = -ERESTART_RESTARTBLOCK; > + out: There is surely no more convoluted way to write this? That goto above is just making my eyes bleed. Yes I know where you copied that from and then removed everthing between the goto and the assignement. ret = abs_time ? -ERESTART_RESTARTBLOCK : -ERESTARTSYS; would be too simple to read. Also this is a new interface and there is absolutely no reason to make it misfeature compatible to FUTEX_WAIT even for the case that this might share code. Supporting relative timeouts is wrong to begin with and for absolute timeouts there is no reason to use restartblock. > +static int futex_wait_multiple(u32 __user *uaddr, unsigned int flags, > + u32 count, ktime_t *abs_time) > +{ > + struct futex_wait_block *wb; > + struct restart_block *restart; > + int ret; > + > + if (!count) > + return -EINVAL; > + > + > + wb = kcalloc(count, sizeof(struct futex_wait_block), GFP_KERNEL); There are a couple of wrongs here: - Lacks a sane upper limit for count. Relying on kcalloc() to fail is just wrong. - kcalloc() does a pointless zeroing for a buffer which is going to be overwritten anyway - sizeof(struct foo) instead of sizeof(*wb) count * sizeof(*wb) must be calculated anyway because copy_from_user() needs it. So using kcalloc() is pointless. > + if (!wb) > + return -ENOMEM; > + > + if (copy_from_user(wb, uaddr, > + count * sizeof(struct futex_wait_block))) { And doing the size calculation only once spares that fugly line break. But that's cosmetic. The way more serious issue is: How is that supposed to work with compat tasks? 32bit and 64 bit apps do not share the same pointer size and your UAPI struct is: > +struct futex_wait_block { > + __u32 __user *uaddr; > + __u32 val; > + __u32 bitset; > +}; A 64 bit kernel will happily read beyond the user buffer when called from a 32bit task and all struct members turn into garbage. That can be solved with padding so pointer conversion can be avoided, but you have to be careful about endianness. Coming back to to allocation itself: do_futex_wait_multiple() does another allocation depending on count. So you call into the memory allocator twice in a row and on the way out you do the same thing again with free. Looking at the size constraints. If I assume a maximum of 65 futexes which got mentioned in one of the replies then this will allocate 7280 bytes alone for the futex_q array with a stock debian config which has no debug options enabled which would bloat the struct. Adding the futex_wait_block array into the same allocation becomes larger than 8K which already exceeds thelimit of SLUB kmem caches and forces the whole thing into the page allocator directly. This sucks. Also I'm confused about the 64 maximum resulting in 65 futexes comment in one of the mails. Can you please explain what you are trying to do exatly on the user space side? Thanks, tglx