Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp26347110rwd; Mon, 3 Jul 2023 08:29:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlEB5o2IE0VgCYVI5KB5R451t8RznPnbb2sh3bP7mpuH8SjSBVnzE6aYfXDUaY0KbPZ5gydA X-Received: by 2002:a17:90a:312:b0:263:ad97:38f1 with SMTP id 18-20020a17090a031200b00263ad9738f1mr7952470pje.3.1688398177593; Mon, 03 Jul 2023 08:29:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688398177; cv=none; d=google.com; s=arc-20160816; b=RShTqXy7GfEIYHHuffejhr68JU0pXczH5nEzMQVXf6856qyAhSQwIVfyAiN6gm26Gn j2uLllEeNEwfq7mruWtjXCzw/9lkgLffEYZ6E8Mwigqkl0x0HEZ7A8PBxSButliRLoAT HX+e+GpqHZQxCHUZbymCgX+oAXqIeCEyFNnkFIMwpKFMvmZwrSQs4xLGYvngtt/Qq6oS ls/57QeZDX6nZkk17S96pAj+dAgfzdnMfo6rfKNp+ioMIsFz/OTznCEVsnvdiBTa1961 AKWsBXGSULwUw0uYXp0vBAKtdVvP8VKVy6T6VfOCAUBgFnUiR8PoWO3vl1sx7V2wWZVr KKAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=IOjp4R4v7q8jXAPkF1mg3yVUUaxxRoCbm+O01cVz82U=; fh=jrQ9fiPTWNRNKH0mgCFvH9s+O+y7O3fypX/goJ4ZF1w=; b=VfjNujXhb6tKCOW72F2d8BDbpElbLOgEq+POmyGlQWHSooZYXTVQYQ8DyfUSKZlYSY JKnBaASI3YJPZLeuPxLQFAZeAV5qiLrG+kWo0Ys7JmEuzMJvK5uTXidHZJOmCojh5DZb CuHhgBVK27v57IH9gofb6R00fdHMP3LRzv1OmEStFzPd+Shl0ShjI/bhm3tg1f2zCgBS 7mjqj/dLQuJiHcDx96wrXlYuhCE0RrfufFUvUplOg9/ECEDYT8m4J/w9fFyqk5VSdGCm CHp25hitFLJajhDtz5bEaZScDY+S36yn/oPoVgJpoO4UB9iKW2FHypsEhLOgishEAAxJ F5Ng== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb11-20020a17090b4a4b00b002631f72d2d1si13474238pjb.173.2023.07.03.08.29.24; Mon, 03 Jul 2023 08:29:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231707AbjGCPJH convert rfc822-to-8bit (ORCPT + 99 others); Mon, 3 Jul 2023 11:09:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231381AbjGCPJG (ORCPT ); Mon, 3 Jul 2023 11:09:06 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3AE5E44; Mon, 3 Jul 2023 08:09:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6004960F99; Mon, 3 Jul 2023 15:09:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 595B7C433C8; Mon, 3 Jul 2023 15:08:59 +0000 (UTC) Date: Mon, 3 Jul 2023 11:08:57 -0400 From: Steven Rostedt To: Ajay Kaher Cc: "mhiramat@kernel.org" , "shuah@kernel.org" , "linux-kernel@vger.kernel.org" , "linux-trace-kernel@vger.kernel.org" , "linux-kselftest@vger.kernel.org" , Ching-lin Yu , Nadav Amit , "srivatsa@csail.mit.edu" , Alexey Makhalov , Vasavi Sirnapalli , Tapas Kundu , "er.ajay.kaher@gmail.com" Subject: Re: [PATCH v3 03/10] eventfs: adding eventfs dir add functions Message-ID: <20230703110857.2d051af5@rorschach.local.home> In-Reply-To: References: <1685610013-33478-1-git-send-email-akaher@vmware.com> <1685610013-33478-4-git-send-email-akaher@vmware.com> <20230701095417.3de5baab@rorschach.local.home> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 3 Jul 2023 10:13:22 +0000 Ajay Kaher wrote: > >> +/** > >> + * eventfs_down_write - acquire write lock function > >> + * @eventfs_rwsem: a pointer to rw_semaphore > >> + * > >> + * helper function to perform write lock on eventfs_rwsem > >> + */ > >> +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) > >> +{ > >> + while (!down_write_trylock(eventfs_rwsem)) > >> + msleep(10); > > > > What's this loop for? Something like that needs a very good explanation > > in a comment. Loops like these are usually a sign of a workaround for a > > bug in the design, or worse, simply hides an existing bug. > > > > Yes correct, this logic is to solve deadlock: > > Thread 1 Thread 2 > down_read_nested() - read lock acquired > down_write() - waiting for write lock to acquire > down_read_nested() - deadlock > > Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting. > down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent > deadlock scenario. > > I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s > upstreamed, tested and working in cifs, please refer: > https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438 I just looked at that code and the commit, and I honestly believe that is a horrible hack, and very fragile. It's in the smb code, so it was unlikely reviewed by anyone outside that subsystem. I really do not want to prolificate that solution around the kernel. We need to come up with something else. I also think it's buggy (yes the cifs code is buggy!) because in the comment above the down_read_nested() it says: /* * nested locking. NOTE: rwsems are not allowed to recurse * (which occurs if the same task tries to acquire the same * lock instance multiple times), but multiple locks of the * same lock class might be taken, if the order of the locks * is always the same. This ordering rule can be expressed * to lockdep via the _nested() APIs, but enumerating the * subclasses that are used. (If the nesting relationship is * static then another method for expressing nested locking is * the explicit definition of lock class keys and the use of * lockdep_set_class() at lock initialization time. * See Documentation/locking/lockdep-design.rst for more details.) */ So this is NOT a solution (and the cifs code should be fixed too!) Can you show me the exact backtrace where the reader lock gets taken again? We will have to come up with a way to not take the same lock twice. We can also look to see if we can implement this with RCU. What exactly is this rwsem protecting? > > Looking further for your input. I will add explanation in v4. > > > >> +} > >> + [..] > >> + * > >> + * This function creates the top of the trace event directory. > >> + */ > >> +struct dentry *eventfs_create_events_dir(const char *name, > >> + struct dentry *parent, > >> + struct rw_semaphore *eventfs_rwsem) > > > > OK, I'm going to have to really look at this. Passing in a lock to the > > API is just broken. We need to find a way to solve this another way. > > eventfs_rwsem is a member of struct trace_array, I guess we should > pass pointer to trace_array. No, it should not be part of the trace_array. If we can't do this with RCU, then we need to add a descriptor that contains the dentry that is returned above, and have the lock held there. The caller of the eventfs_create_events_dir() should not care about locking. That's an implementation detail that should *not* be part of the API. That is, if you need a lock: struct eventfs_dentry { struct dentry *dentry; struct rwsem *rwsem; }; And then get to that lock by using the container_of() macro. All created eventfs dentry's could have this structure, where the rwsem points to the top one. Again, that's only if we can't do this with RCU. -- Steve > > > > I'm about to board a plane to JFK shortly, I'm hoping to play with this > > while flying back. > > > > I have replied for major concerns. All other minor I will take care in v4. > > Thanks a lot for giving time to eventfs patches. > > - Ajay >