Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp400919imu; Wed, 12 Dec 2018 19:51:53 -0800 (PST) X-Google-Smtp-Source: AFSGD/UO+vB5GW7h5gjsllTRKjew0PA3DogHsMTgISh33+QaCwHwtMbBJ0/IqPQW14GwRZ1hpJc8 X-Received: by 2002:a17:902:bf0c:: with SMTP id bi12mr22499031plb.0.1544673113512; Wed, 12 Dec 2018 19:51:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544673113; cv=none; d=google.com; s=arc-20160816; b=xjPMwwGIeGIdv7wNppDlHZew/Gag5ob4WWl0DZ2k/v4Sc0+y9z6yHOaD5bFfYJjIoR IYhyBNHFS3u1Wm5NbUYX4zqx4xCH1ZnB+5xkTrnmsdPLeMNbH7eCMg77YOBHzFoTJo7B t6O329jrThc52VYykMbZ+psHrUL9i1aUQfA2Y9hliwZ/u7klKpl8O85MbTHpU4f6itzw W5teRuXrAZOzskm9VO1doB8KB+6NBc5FIUD5/2+PTfBL2R9qY/qiEbbv5ehtSqblP9Oj PM81PH632KxWrml//x+QWE/QOliWobq3Jw0L7KFjKV6SnWX6Xo1VcZ0GFU4BsCl2WdEe O4nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:message-id:references :in-reply-to:subject:cc:date:to:from; bh=rWeuGiWKdQTZpdWlfTLjp7QH8W8NvO9AatFpq/1CoIA=; b=FA51/fGurqlKAAMwG5LG7EVbz3REAFyqlC1f0cmchfQ9qzqedDKqhmdTkbFTdlIOkk ctwuQsjbCXworipB6xjef32f5E3hj1WGginshLJMcYeRFOcF796Py7krsXwbcXCoTZ3A B6ROFQXRfU1HhOnCrBvuhIPoZZDdE3eotxAXY00frXCf+LueOcqv85Z64dg9Sz6hY4IL 3iXqAfR8YdYEhDGM/gbStII9T8ZGcWwLqwuEwlw/VX7N6xwBuWS1ktZKHdePtSJTsVzH +nxxmP9w4JBwOVfQHp6D/Uiv9rLNUJ6Pt6K+k6r0RCzrP+J00m5qpDlU1gOEe8vrSWrQ YJWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z8si539668pgf.577.2018.12.12.19.51.24; Wed, 12 Dec 2018 19:51:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726806AbeLMDtN (ORCPT + 99 others); Wed, 12 Dec 2018 22:49:13 -0500 Received: from mx2.suse.de ([195.135.220.15]:60244 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726437AbeLMDtN (ORCPT ); Wed, 12 Dec 2018 22:49:13 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 19FD9AC81; Thu, 13 Dec 2018 03:49:10 +0000 (UTC) From: NeilBrown To: Herbert Xu Date: Thu, 13 Dec 2018 14:48:59 +1100 Cc: Thomas Graf , Tom Herbert , David Miller , netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH net-next] rhashtable: further improve stability of rhashtable_walk In-Reply-To: <20181213014309.uufaeoijhympgxbz@gondor.apana.org.au> References: <153086109256.2825.15329014177598382684.stgit@noble> <87zhtkeimx.fsf@notabene.neil.brown.name> <20181207053943.7zacyn5uvqkfnfoi@gondor.apana.org.au> <87k1kico1o.fsf@notabene.neil.brown.name> <20181211051755.modgomqzszkbiihe@gondor.apana.org.au> <87mupbvch0.fsf@notabene.neil.brown.name> <20181212054601.wbzpxjunnsfi62mz@gondor.apana.org.au> <87efanuu06.fsf@notabene.neil.brown.name> <20181212080037.j2zs22t57uxdu2jr@gondor.apana.org.au> <87bm5ruo3f.fsf@notabene.neil.brown.name> <20181213014309.uufaeoijhympgxbz@gondor.apana.org.au> Message-ID: <8736r2ulw4.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Thu, Dec 13 2018, Herbert Xu wrote: > On Wed, Dec 12, 2018 at 07:49:08PM +1100, NeilBrown wrote: >> On Wed, Dec 12 2018, Herbert Xu wrote: >>=20 >> > On Wed, Dec 12, 2018 at 05:41:29PM +1100, NeilBrown wrote: >> >> >> >> So you would substantially slow down the rhashtable_walk_start() step. >> > >> > This whole thing is meant for uses such as /proc and netlink >> > enumeration. Speed is definitely not a prerogative of these >> > iterators. >>=20 >> An RCU grace period is typically 10s of milliseconds. It doesn't take >> very many of those to start being noticed. > > Why would you even need an RCU grace period for deleting and adding > the walker object to the bucket list? You just allocate a new one > each time and free the old one after an RCU grace period. I don't > see where the latency is coming from. Yes, you could rcu_free the old one and allocate a new one. Then you would have to be ready to deal with memory allocation failure which complicates usage (I already don't like that rhashtable_insert() can report -ENOMEM!). Another problem with inserting a marker object on rhashtable_walk_stop() is that it might not be clear where to insert it. You would have to take the spinlock, and then you might find that the location that you want to mark has been deleted from the list. I think you can probably still manage a reliable placement, but it could get messy. > >> > For that matter, if speed was an issue then surely you would >> > not drop out of the RCU read lock at all while iterating. >>=20 >> tipc_nl_sk_walk() drops out of RCU for every object. >> I don't know what it is used for, but I doubt that making it thousands >> of times slower would be a good thing. > > Now you're conflating two different things. Dropping the RCU > isn't necessarily slow. We were talking about waiting for an > RCU grace period which would only come into play if you were > suspending the walk indefinitely. Actually as I said above even > there you don't really need to wait. How would rhashtable_walk_stop() know if it was indefinite or not? Yes, if you allocate a new marker on each rhashtable_walk_start() (passing in a gfp_t and coping with errors) then you don't need to wait a grace period. If you reuse one to avoid the errors, you do. > >> > It sounds to me like you're trying to use this interface for >> > something that it's simply not designed to do. >>=20 >> I'm just trying to make sure we have a reliable data structure which I >> can use without having to be worried that I might accidentally hit some >> unsupported behaviour. > > Now that is something I agree with. > >> I don't really see why you think my patch is all that complex. >> The only conceptual change is that hash chains are now ordered, and >> ordered chains are easy to understand. >> The biggest part of the code change is just moving some code from >> rhashtable_walk_start() to rhashtable_walk_next(). >>=20 >> Why don't you like it? > > My main beef is the fact that it doesn't work with rhlist. So down > the track either we'll have to add more complexity for rhlist or > we will have to rip this out again do something completely different. You have previously said that functionality which isn't needed by current code should not be a concern. No current code ever stops and restarts an rhlist walk, so this shouldn't be relevant. Solve that problem when/if we come to it. If this is really the main beef, then let's turn the question around. Suppose this patch had landed before rhltables had landed. How would we implement rhltables without breaking this functionality? Keeping all the objects with the same key in a linked list is clearly a good idea - make this a circular list as there is no obvious "first". *Not* keeping them all in the hash chain is ideal, but not essential. I see three costs with this. One is that we would compare the same key multiple times for lookup. How much of a problem is that? A failing compare is usually quite quick, and most rhltable uses have inline memcmp for comparison (admittedly not all). The second cost is tracking the chain length against elasticity. We could flag one object with each key as a 'master' (low bit of the 'next' pointer) and only count the masters. When lookup raced with remove this might get a slightly incorrect count, but I don't think that hurts. Finally, there is more pointer chasing as the chains are longer. Was performance part of the reason for adding rhltables? It isn't mentioned in the commit message, but it is easy to miss things. Thanks, NeilBrown > > Cheers, > --=20 > Email: Herbert Xu > Home Page: http://gondor.apana.org.au/~herbert/ > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAlwR1qsACgkQOeye3VZi gbm9EQ//YMmhaw/7qwcdmu7LkybnSiR7bI0EkcaEFLLLL2ujLgOQmg4SpaueNFuo PLOvmOfbpq1UGGt8OVOjPt0Fy4M/Bca0CczgkVZY8eagQtoTkxa9KFgsddZPLB4q 2RguKZr489AJre1/Q2JIONdS4ZnIMQXclnlLHjeMp4pSUv6C4pL4BUQ+z8K3geD7 EREz6SMfGSsU1oIY49QAomKQ2yjlxwbYYc2FbLVF/2YmO9DJPrWGTVNUV2AevdlT dJNDxkR34WqK2sps4VRdZjRdawxbdR9CZrhh28DbxAO8Z0lV0zJfm/7buo3dX1Br h4T7+U6i+WPqH/nGCzda6H1j1OC23eIjGME9hVcv/RpO881Pnd0oxGxP1ly0i01G GmDr913bhzFqUEh/lVFGQAtfnEaxK7NK/Z57gWKC1+yr41lAmo8fZGY9xW0bCA7L Lr5/PhFYL2INTY3WC8P2VtJzYhfM+TfJbOn7dP6lFkKgd0kS5ieiLvaSfZ+Ua/hI gswPlPgWPdU0adNu789bszpm00lybI8SNP1JkDTXsEIZw/HwYovtpIt7HvlwVj58 6XrWXi+upb1FiH/uEBDtjONx0UJsoLHK16gV28ZWkVJlDHtW7Gyg87ERyd79UOtA DrzmX29zvK/ob5alPwXIvbeJzsy2J52889bGHYwxbKhhAUuu8n8= =7HH4 -----END PGP SIGNATURE----- --=-=-=--