Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp6476256imm; Mon, 27 Aug 2018 17:06:51 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZKx8asb9tzREYk9U5NS6ikMiVLxMdZZpXjn55BPZccJ2EVDBAd2NluDwTYCQECzScUb4PM X-Received: by 2002:a17:902:b491:: with SMTP id y17-v6mr3790365plr.160.1535414811601; Mon, 27 Aug 2018 17:06:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535414811; cv=none; d=google.com; s=arc-20160816; b=vWCNGSytWnQRrZIUKv/BnDY/S7XflXgEaFNVIpcZhoQcwF8HyJWlJBiweNShh8ZKjj HkudOnTw67ORkF4aKRNaCaldDnAcygGo2QVlvHacq9YP1Cg4g4ZQkLQa/2VmJYec5MKy jdFR6ocCbvChk7ivW1CTn/W7SkT1lo78GbqDRBjax6RvPPba3Ocd5yWXD0ZNAgarh0JQ jlHr+Ow6k7NZfQHGnmUJrHyVoJ50PRyp8CAmG8CUe+NwW4GBUrxnwDc+UFNL7LaTiwAj gxCzCteVYQI8oK2giQHwq8cZGIkzhvM2MuJ1DoO7MulKfeqezVL9XuKtvVrzDrdYItxp Cepw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=4bzSVGRQdL4lOsspuC6wH6sCGpNVrIWKvuay2tWgl3c=; b=NaQHUbhO5Rbmxq9PQ/gcrr7H9u78fA6EvGW8n3P9CLpS/ZEh6dVXE+oFaUrZEB1I1m TeVCMcDfZ8k0JQy2aQ7ISaclYbFsPUX+sEweD52lNzpdr2dCHJCj8mFxeLPTtni3oaw/ mjH/1Pkeu5GsTZjqmvlJpE1+7q04EJnIQCQnZjZoWh/rdvJBOEDhSOmw3Jlc2nZLqoH8 b0/vrDo2To3gZcLcA4i/UVV5XciILSMgRoUTWVNUgl9tXyyEJPA/EZcrDly0dfXU9WOY /VrcxMD1zsAqgfJ1Cbt0d3b8/TCmOU+SPJDhIBO9pvkaMgiHfKJVgffkV1Aq5UVauCyr 3XOw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k1-v6si621822pgj.522.2018.08.27.17.06.06; Mon, 27 Aug 2018 17:06:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727175AbeH1DwK (ORCPT + 99 others); Mon, 27 Aug 2018 23:52:10 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:39378 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726994AbeH1DwJ (ORCPT ); Mon, 27 Aug 2018 23:52:09 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fuRTS-0000kN-Kd; Tue, 28 Aug 2018 00:03:10 +0000 Date: Tue, 28 Aug 2018 01:03:10 +0100 From: Al Viro To: Cong Wang Cc: Jamal Hadi Salim , Kees Cook , LKML , Jiri Pirko , David Miller , Linux Kernel Network Developers Subject: Re: [PATCH] net: sched: Fix memory exposure from short TCA_U32_SEL Message-ID: <20180828000310.GE6515@ZenIV.linux.org.uk> References: <20180826055801.GA42063@beast> <20180826061534.GT6515@ZenIV.linux.org.uk> <20180826173236.GU6515@ZenIV.linux.org.uk> <20180826225749.GY6515@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 27, 2018 at 02:31:41PM -0700, Cong Wang wrote: > > I cant think of any challenges. Cong/Jiri? Would it require development > > time classifiers/actions/qdiscs to sit in that directory (I suspect you > > dont want them in include/net). > > BTW, the idea of improving grep-ability of the code by prefixing the > > ops appropriately makes sense. i.e we should have ops->cls_init, > > ops->act_init etc. > > Hmm? Isn't struct tcf_proto_ops used and must be provided > by each tc filter module? How does it work if you move it into > net/sched/* for out-of-tree modules? Are they supposed to > include "..../net/sched/tcf_proto.h"?? Or something else? If you care about out-of-tree modules, that could easily live in include/net/tcf_proto.h, provided that it's not pulled by indirect includes into hell knows how many places. Try make allmodconfig make >/dev/null 2>&1 find -name '.*.cmd'|xargs grep sch_generic.h That finds 2977 files here, most of them having nothing to do with net/sched. > BTW, we need some grep tool that really understands C syntax, > not making each variable friendly to plain grep. This isn't the matter of C syntax; it needs to handle C typization, and you really can't do that anywhere near reliably without looking at preprocessor output. Which very much depends upon .config... BTW, something odd in cls_u32.c: what happens if we have the following graph: tcf_proto , it's ->data being and ->root - tc_u_common , in its ->hlist , in its ->ht[0] and set ->ht_down in to the ? AFAICS, there's nothing to prevent that - TCA_U32_LINK being 0x80000000 will do just that. What happens upon u32_destroy() in that case? Unless I'm misreading that code, refcounts will be : 1 : 2 : 1 and in u32_destroy() we'll get this: root_ht = tp_c = if (root_ht && --root_ht->refcnt == 0) u32_destroy_hnode(tp, root_ht, extack); decrements refcnt to 1 and does nothing else. if (--tp_c->refcnt == 0) { is satisfied hlist_del(&tp_c->hnode); unhashed while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) { we take ht = u32_clear_hnode(tp, ht, extack); which does for (h = 0; h <= ht->divisor; h++) { while ((n = rtnl_dereference(ht->ht[h])) != NULL) { n = RCU_INIT_POINTER(ht->ht[h], rtnl_dereference(n->next)); remove from ->ht[0] tcf_unbind_filter(tp, &n->res); u32_remove_hw_knode(tp, n, extack); idr_remove(&ht->handle_idr, n->handle); if (tcf_exts_get_net(&n->exts)) tcf_queue_work(&n->rwork, u32_delete_key_freepf_work); else u32_destroy_key(n->tp, n, true); ... and we hit u32_destroy_key(, , true), which does struct tc_u_hnode *ht = rtnl_dereference(n->ht_down); ht = tcf_exts_destroy(&n->exts); tcf_exts_put_net(&n->exts); if (ht && --ht->refcnt == 0) kfree(ht); *NOW* ->refcnt is 0, and we free the damn thing. .... kfree(n); is freed and we return to u32_destroy_hnode() where we see that there's nothing else left in ->ht[...] and return to u32_destroy(). Where RCU_INIT_POINTER(tp_c->hlist, ht->next); sets ->hlist to ->next, aka . Which is already freed. /* u32_destroy_key() will later free ht for us, if it's * still referenced by some knode */ if (--ht->refcnt == 0) kfree_rcu(ht, rcu); ->refcnt reaches 0 and we free it (RCU-delayed) } ... and we go for the next iteration, this time with ht = . Doing all kinds of unsanitary things to the memory it used to occupy... Incidentally, if we hit tcf_queue_work(&n->rwork, u32_delete_key_freepf_work); instead of u32_destroy_key(), the things don't seem to be any better - we won't do anything to until rtnl is dropped, so u32_destroy() won't break on the second pass through the loop - it'll free there and return. Setting us up for trouble, since when u32_delete_key_freepf_work() finally gets to u32_destroy_key() we'll have ->ht_down pointing to freed memory and decrementing its contents... What am I missing in there? Is it just "we should never have ->ht_down pointing to anyone's ->root"? If so, I'm not sure how to detect that; if not... what should happen to the orphaned root_ht? Should it remain on the list? We might have two tcf_proto sharing tp->data, so tp_c and its list might very well survive the u32_destroy()... Note, BTW, that if we do leave the orphan on the list and later change the tc_u_knode so that ->ht_down doesn't point to that thing anymore, we'll get its refcount incremented to 2 in u32_init_knode(), then decremented to 1 by u32_set_parms() and then arrange for u32_delete_key_work() to be run. Which will drive the refcount to 0 and free the damn thing. While it's still in the middle of ->hlist...