Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp2005757rwi; Fri, 28 Oct 2022 01:36:59 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5S3iRUB8DNECWeYhvlLR7gaRcPF9MdUI7WfZzOsZbzDxtIE2NOgW587NdTn5jYE0nUipwz X-Received: by 2002:a05:6a00:1306:b0:555:6d3f:11ed with SMTP id j6-20020a056a00130600b005556d3f11edmr53508290pfu.55.1666946219321; Fri, 28 Oct 2022 01:36:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666946219; cv=none; d=google.com; s=arc-20160816; b=zsnRFHomEVJiVLECzUipT6gO8rXfa/Wsiq0zdpCS8DxzzBMW2cH1uYSKje/tzduvFV D2bqXdWWZCdD+PHrMv8h8ZLaNserm4jE7T3QEPbv7ztrirdzElHnVuqWwjKu7AP1e9Ln N0nMrvvY6/4KNinbN4FpEatthWLRX3sMEvQSuJw2fr3VjckBnKl3JAuyT8emXH9EF45Z BMwyL6ioM3G9rtapUvYlmF9c8F5m+rMz225EKK7sFrf+juXIVMZzZgg3KPc3J2pMAAjB QwTjvNmR2hOltr6jn/sDQYh8vs7u44N27RXEDc1ws1UHp0uAQz3rJk1UhXi4eTFwfrJ/ 4wJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :mime-version:accept-language:in-reply-to:references:message-id:date :thread-index:thread-topic:subject:cc:to:from; bh=l+966JgWnqr/JSjxOQSCiz6EZYjwevmlV2wvFyvNxUo=; b=oE1aWZMDzynzn1OY7wqn+942coARsxpjC4CBqZucGkZzvQ7vkLayCwRhwzej0s64hs IxgQLZyWuroFB23oSjc7NDwrh9NVuJhMnjT9fTU7pdofraJolU1rHClWcSL9dVhjIHR+ 8IvtP57F6f6Vf2SRkEtrzGcMP2hyaduID04aW3MlwDXEWESaQow3ctIh9ouF4M2MjMGn bYvMNm/gtvVPL2t3gpT9AkfwKtUSNxZZXWlgoG8fpQFvJBRu1QJSrH6ECWqzT8ydRKmr qovCT4F4WWo21jNu2R/Tg48IPKAV19cpQZ2BgOBlzjCVdQmvzd7NwwnjWT6nJlhmbyeM U81A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h4-20020a631204000000b00455be728dbasi4447910pgl.820.2022.10.28.01.36.47; Fri, 28 Oct 2022 01:36:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=aculab.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230236AbiJ1Hvm convert rfc822-to-8bit (ORCPT + 99 others); Fri, 28 Oct 2022 03:51:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230142AbiJ1Hvh (ORCPT ); Fri, 28 Oct 2022 03:51:37 -0400 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE7711BF856 for ; Fri, 28 Oct 2022 00:51:33 -0700 (PDT) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-32-5DrEyXUYNVeSOoVba2Mbxw-1; Fri, 28 Oct 2022 08:51:30 +0100 X-MC-Unique: 5DrEyXUYNVeSOoVba2Mbxw-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.42; Fri, 28 Oct 2022 08:51:28 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.042; Fri, 28 Oct 2022 08:51:28 +0100 From: David Laight To: 'Jozsef Kadlecsik' , Daniel Xu CC: Pablo Neira Ayuso , Florian Westphal , "netfilter-devel@vger.kernel.org" , "coreteam@netfilter.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "ppenkov@aviatrix.com" Subject: RE: ip_set_hash_netiface Thread-Topic: ip_set_hash_netiface Thread-Index: AQHY6Td/ZX+DEWvLgEGa9nnokIvRm64jbupg Date: Fri, 28 Oct 2022 07:51:28 +0000 Message-ID: <4a0da0bfe87b4e10a83b97508d3c853e@AcuMS.aculab.com> References: <9a91603a-7b8f-4c6d-9012-497335e4373b@app.fastmail.com> <7fcf3bbb-95d2-a286-e3a-4d4dd87f713a@netfilter.org> In-Reply-To: <7fcf3bbb-95d2-a286-e3a-4d4dd87f713a@netfilter.org> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Jozsef Kadlecsik > Sent: 26 October 2022 13:26 > > On Tue, 25 Oct 2022, Daniel Xu wrote: > > > I'm following up with our hallway chat yesterday about how ipset > > hash:net,iface can easily OOM. > > > > Here's a quick reproducer (stolen from > > https://bugzilla.kernel.org/show_bug.cgi?id=199107): > > > > $ ipset create ACL.IN.ALL_PERMIT hash:net,iface hashsize 1048576 timeout 0 > > $ for i in $(seq 0 100); do /sbin/ipset add ACL.IN.ALL_PERMIT 0.0.0.0/0,kaf_$i timeout 0 - > exist; done > > > > This used to cause a NULL ptr deref panic before > > https://github.com/torvalds/linux/commit/2b33d6ffa9e38f344418976b06 . > > > > Now it'll either allocate a huge amount of memory or fail a > > vmalloc(): > > > > [Tue Oct 25 00:13:08 2022] ipset: vmalloc error: size 1073741848, exceeds total pages > > <...> > > [Tue Oct 25 00:13:08 2022] Call Trace: > > [Tue Oct 25 00:13:08 2022] > > [Tue Oct 25 00:13:08 2022] dump_stack_lvl+0x48/0x60 > > [Tue Oct 25 00:13:08 2022] warn_alloc+0x155/0x180 > > [Tue Oct 25 00:13:08 2022] __vmalloc_node_range+0x72a/0x760 > > [Tue Oct 25 00:13:08 2022] ? hash_netiface4_add+0x7c0/0xb20 > > [Tue Oct 25 00:13:08 2022] ? __kmalloc_large_node+0x4a/0x90 > > [Tue Oct 25 00:13:08 2022] kvmalloc_node+0xa6/0xd0 > > [Tue Oct 25 00:13:08 2022] ? hash_netiface4_resize+0x99/0x710 > > <...> > > > > Note that this behavior is somewhat documented > > (https://ipset.netfilter.org/ipset.man.html): > > > > > The internal restriction of the hash:net,iface set type is that the same > > > network prefix cannot be stored with more than 64 different interfaces > > > in a single set. > > > > I'm not sure how hard it would be to enforce a limit, but I think it would > > be a bit better to error than allocate many GBs of memory. > > That's a bug, actually the limit is not enforced in spite of the > documentation. The next patch fixes it and I'm going to submit to Pablo: > > diff --git a/net/netfilter/ipset/ip_set_hash_gen.h b/net/netfilter/ipset/ip_set_hash_gen.h > index 6e391308431d..3f8853ed32e9 100644 > --- a/net/netfilter/ipset/ip_set_hash_gen.h > +++ b/net/netfilter/ipset/ip_set_hash_gen.h > @@ -61,10 +61,6 @@ tune_bucketsize(u8 curr, u32 multi) > */ > return n > curr && n <= AHASH_MAX_TUNED ? n : curr; > } > -#define TUNE_BUCKETSIZE(h, multi) \ > - ((h)->bucketsize = tune_bucketsize((h)->bucketsize, multi)) > -#else > -#define TUNE_BUCKETSIZE(h, multi) > #endif > > /* A hash bucket */ > @@ -936,7 +932,11 @@ mtype_add(struct ip_set *set, void *value, const struct ip_set_ext *ext, > goto set_full; > /* Create a new slot */ > if (n->pos >= n->size) { > - TUNE_BUCKETSIZE(h, multi); > +#ifdef IP_SET_HASH_WITH_MULTI > + if (h->bucketsize >= AHASH_MAX_TUNED) > + goto set_full; > + h->bucketsize = tune_bucketsize(h->bucketsize, multi); > +#endif AFAICT this is the only call of tune_bucketsize(). It is defined just above TUNE_BUCKETSIZE as: static u8 tune_bucketsize(u8 curr, u32 multi) { u32 n; if (multi < curr) return curr; n = curr + AHASH_INIT_SIZE; /* Currently, at listing one hash bucket must fit into a message. * Therefore we have a hard limit here. */ return n > curr && n <= AHASH_MAX_TUNED ? n : curr; } If I'm reading it correctly this is just: return curr >= multi || curr >= 64 ? curr : curr + 2; (the 'n > curr' test is unconditionally true). The extra check is limiting it to 12 (AHASH_MAX_TUNED) not 64. Quite why the change makes a significant difference to the validity of the kvalloc() is another matter. Changing a multiplier from 64 to 12 seems unlikely to be that significant - if it is you wouldn't want to be multiplying by 12. I've not looked what 'multi' is, but I'm sort of surprised it isn't used as the new bucketsize. Also it doesn't really look right to have lots of static functions in a .h file? David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)