Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp2231018ybv; Fri, 21 Feb 2020 11:22:27 -0800 (PST) X-Google-Smtp-Source: APXvYqzt9Kzhb5uU8EUNQN0mOJwLopS8TvmRHWcAtPLsQdZRyy0vwku1BUqyJTL8VyNhdciaTxXp X-Received: by 2002:a9d:66c1:: with SMTP id t1mr26763533otm.73.1582312946950; Fri, 21 Feb 2020 11:22:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582312946; cv=none; d=google.com; s=arc-20160816; b=j2N7HUn3njHYWvKYY1/orS6UwuHt5R1O3RkVFQGQoK6Ex/gFneeRqL/ysPGzdC8LWA RJmcpUwDCv1rcsOOtX9vZX3eoUDSvIs5OPJB7jLWtYknZCRbfgbShinpUtUoC0vIgVu1 uzFqNlZwl9D4ny5+k95Nwe1w55+fgcFeE7k88Rd/VvufkjJEpFny+BgtAajxe0geN5hD StXV4E5eqHYzmDneY2V/Ya19tvvzuyNaHcfZGPcAr9BPUk7iNfmprMkFe9fM5rumHBHH YjBYazpLBU4eY1rPYppmFRlL/PmNvszXdo5rIaQAtpzHXpuSv3hUoFSsgJfKqeHWeQim xvnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:date:from:dkim-signature; bh=bDDD9QS51C8HABJX5Kswii4LZuMup2Q1KmKUNTg9WcI=; b=dmcOrIQRt3CnDy7kH6UW+m0Th/Zcoo4Hoam/Y8Wg2gXbXjemKv4jJVS07G7VpJlQ7i EQY5LPIkc86pgEGm+3ZxiVLTYbQX7vUkc3PZr41aLml3NK3Y08WHnbEyptJRWR6CFMLk lHqeHoJrJETyXzx3U+4GlAeAtofKC0NXsP1yyMWDN+jfMmNYmFLOy8576cHVzAjA7ht3 VC+r1wZ4VBd8o9V0JrOBD0Qe1YG6fjmTxNpQMJ/kADFoIC73cW81XcxY56d45s6w86Jm EVRNtMk+9oVYjO2K9vQSQyj2trfVr/EyfCWC0rbbZEMiiQHLPvVjTwJ47dhzbUqAV5mj f7Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="d1rNua/l"; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a3si1340498oib.168.2020.02.21.11.22.15; Fri, 21 Feb 2020 11:22:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="d1rNua/l"; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729455AbgBUTWH (ORCPT + 99 others); Fri, 21 Feb 2020 14:22:07 -0500 Received: from mail-lj1-f195.google.com ([209.85.208.195]:35962 "EHLO mail-lj1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726423AbgBUTWH (ORCPT ); Fri, 21 Feb 2020 14:22:07 -0500 Received: by mail-lj1-f195.google.com with SMTP id r19so3348771ljg.3; Fri, 21 Feb 2020 11:22:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:date:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=bDDD9QS51C8HABJX5Kswii4LZuMup2Q1KmKUNTg9WcI=; b=d1rNua/lZJ5hE0ZUd/ZtqyYz4uPoKdWgw7gFNQSOhnpmXpqoyRpxXPAZ3AzI6QHfu7 uxZ/EjEufQ4uoqzIbMaUdH/+ZGiz/CU+TNxr3LOiB10oKqgdBmq1ULCWikJk9lQ424wM LG8stYBGhHincNDLy52LYL7GoRP9K4qz8itg0UOUMLauWe6VssqKqbGFJK5i/BB5Qtp9 1ojRAi+TaPtTbeGuAlyS8JvJFQDrUCl88UEmQ24qaqPl69s4k2vqCib2w30EHqwOn9J8 2DAmWCXisbBxYhE6WtCp1plmCo2OJOq3tlQumJ0ecMe7+KJoJgMOWEPvjE3+LVQaLsgt x1bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:date:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=bDDD9QS51C8HABJX5Kswii4LZuMup2Q1KmKUNTg9WcI=; b=T5YHAkr9iYiZK0OWxjXe6z6wcqd05iZAlLBHtx/yMBxLPutndouFEqU9B9MSxQRaUv kgm1Sh7xvPMSDtTgsjnRBE9++4sF2rC5q+y72VmOJsbDk94+jdpBZySlIuqJizT51hhc tYc+dVbnH60IXWtIvi1GU2hpo2473PrMuIuws8HBHmSudarQ4HiyucgmkLTerZ21KF/b YoEMaT9bESTmtXWvCp6i7NRr6hCaqsdet3MiP/Jbv9mM4WhMHPdgX965papkwD5Npi8d PgS/KuEemeiWPOh7mfYSfZufrDFntbI2kr7AwIHVialVD74cG5C9aOYC5F9sQgOkfI4K ZDaA== X-Gm-Message-State: APjAAAVSVMi1ENADO6aY6UctZWmA9RLipxGyB9PxQmFK7Rf8EVr6J9+a k5pvnMWq/1sA5gWXDpxAzDg= X-Received: by 2002:a05:651c:314:: with SMTP id a20mr23125542ljp.91.1582312924151; Fri, 21 Feb 2020 11:22:04 -0800 (PST) Received: from pc636 ([37.139.158.167]) by smtp.gmail.com with ESMTPSA id n3sm2116680ljc.100.2020.02.21.11.22.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Feb 2020 11:22:03 -0800 (PST) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Fri, 21 Feb 2020 20:21:52 +0100 To: Joel Fernandes Cc: Uladzislau Rezki , "Theodore Y. Ts'o" , "Paul E. McKenney" , Ext4 Developers List , Suraj Jitindar Singh , LKML , rcu@vger.kernel.org Subject: Re: [PATCH RFC] ext4: fix potential race between online resizing and write operations Message-ID: <20200221192152.GA6306@pc636> References: <20200215233817.GA670792@mit.edu> <20200216121246.GG2935@paulmck-ThinkPad-P72> <20200217160827.GA5685@pc636> <20200217193314.GA12604@mit.edu> <20200218170857.GA28774@pc636> <20200221120618.GA194360@google.com> <20200221132817.GB194360@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200221132817.GB194360@google.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org > > > > Overall this implementation is nice. You are basically avoiding allocating > > rcu_head like Ted did by using the array-of-pointers technique we used for > > the previous kfree_rcu() work. > > > > One thing stands out, the path where we could not allocate a page for the new > > block node: > > > > > @@ -3061,6 +3148,11 @@ void kfree_call_rcu(struct rcu_head *head, rcu_callback_t func) > > > if (krcp->initialized) > > > spin_unlock(&krcp->lock); > > > local_irq_restore(flags); > > > + > > > + if (!skip_call_rcu) { > > > + synchronize_rcu(); > > > + kvfree(ptr_to_free); > > > > We can't block, it has to be async otherwise everything else blocks, and I > > think this can also be used from interrupt handlers which would at least be > > an SWA violation. So perhaps it needs to allocate an rcu_head wrapper object > > itself for the 'emergeny case' and use the regular techniques. > > > > Another thing that stands out is the code duplication, if we can make this > > reuse as much as of the previous code as possible, that'd be great. I'd like > > to avoid bvcached and bvhead if possible. Maybe we can store information > > about the fact that this is a 'special object' in some of the lower-order > > bits of the pointer. Then we can detect that it is 'special' and free it > > using kvfree() during the reclaim > > Basically what I did different is: > 1. Use the existing kfree_rcu_bulk_data::records array to store the > to-be-freed array. > 2. In case of emergency, allocate a new wrapper and tag the pointer. > Read the tag later to figure its an array wrapper and do additional kvfree. > I see your point and agree that duplication is odd and we should avoid it as much as possible. Also, i like the idea of using the wrapper as one more chance to build a "head" for headless object. I did not mix pointers because then you will need to understand what is what. It is OK for "emergency" path, because we simply can just serialize it by kvfree() call, it checks inside what the ptr address belong to: void kvfree(const void *addr) { if (is_vmalloc_addr(addr)) vfree(addr); else kfree(addr); } whereas normal path, i mean "bulk one" where we store pointers into array would be broken. We can not call kfree_bulk(array, nr_entries) if the passed array contains "vmalloc" pointers, because it is different allocator. Therefore, i deliberately have made it as a special case. > > Perhaps the synchronize_rcu() should be done from a workqueue handler > to prevent IRQ crapping out? > I think so. For example one approach would be: struct free_deferred { struct llist_head list; struct work_struct wq; }; static DEFINE_PER_CPU(struct free_deferred, free_deferred); static void free_work(struct work_struct *w) { struct free_deferred *p = container_of(w, struct free_deferred, wq); struct llist_node *t, *llnode; synchronize_rcu(); llist_for_each_safe(llnode, t, llist_del_all(&p->list)) vfree((void *)llnode, 1); } static inline void free_deferred_common(void *ptr_to_free) { struct free_deferred *p = raw_cpu_ptr(&free_deferred); if (llist_add((struct llist_node *)ptr_to_free, &p->list)) schedule_work(&p->wq); } and it seems it should work. Because we know that KMALLOC_MIN_SIZE can not be less then machine word: /* * Kmalloc subsystem. */ #ifndef KMALLOC_MIN_SIZE #define KMALLOC_MIN_SIZE (1 << KMALLOC_SHIFT_LOW) #endif when it comes to vmalloc pointer it can not be less then one PAGE_SIZE :) Another thing: we are talking about "headless" variant that is special, therefore it implies to have some restrictions, since we need a dynamic memory to drive it. For example "headless" object can be freed from preemptible context only, because freeing can be inlined: + // NOT SURE if permitted due to IRQ. Maybe we + // should try doing this from WQ? + synchronize_rcu(); + kvfree(ptr); Calling synchronize_rcu() from the IRQ context will screw the system up :) Because the current CPU will never pass the QS state if i do not miss something. Also kvfree() itself can be called from the preemptible context only, excluding IRQ, there is a special path for it, otherwise vfree() can sleep. > > debug_objects bits wouldn't work obviously for the !emergency kvfree case, > not sure what we can do there. > Agree. Thank you, Joel, for your comments! -- Vlad Rezki