Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp441333ybe; Wed, 4 Sep 2019 23:47:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqzYKvPsHwyKhbFGbv+YzyUnqrXSGynE2oFCV5q5yQELHMJN8ylcHpZj97roLIS36snvedrr X-Received: by 2002:a63:184b:: with SMTP id 11mr1851050pgy.112.1567666042249; Wed, 04 Sep 2019 23:47:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1567666042; cv=none; d=google.com; s=arc-20160816; b=o0odwH/fXk6II8bAoGGYgebXxmk9VLa39z8MQ8EQYswDNcBgvrKVXF19rBDcs9YgpG /8AW7OVzUFwQj/IITTgnoHxgz8hTa/m4/regFnj7HRHfeVsZR59pG8xWuVtSfxOIhBCf 54QHrxrkHupJLCsOn914nSdk3103y3FXEdrtXUyaE3qx+/mm+oOsWMUteS1Vr9WWvGnE yWvFROX3qqanxP9VUCwXSKI59rSowhPWxZEIzBldQgKWiW8uy2oT39u7z+ZVS0yh+pi9 ya6gRjNQf5b4kSuxToguaZOyFiE2emz7GrReqAeXY4hfEPFKM4V7TzGV4bGv1fOSknMX O/JA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=mZxFxJ+/Vk6rPYmVYqUz6VTgi9ZYfMjjr7v0yWufBXM=; b=N0isqpopRSVLnnY8hhMOxJuZeE8r6QP6ZOZVYrIkvFU1BWjrWYO0aTI09tDyy/N/8k Gl6xxBhfAtZNPKBtYvtDydDa2jR0X7WGSJGO/CIahdj5ljPyioTsTlbhY/NJAo97f0c/ hgj37O/Fun2ZlXSxDGsgQrfTVruCWJfHJjySjlwCpUpMJ5EXISthCUDyAcrENvRAclEt MSNkZun74oMHMq9jjE7JcRewlFOlUa/rodmj10FqFEwEjFcaWNnw6wkm/+FARM2wvTTk 4UjiwBp2TbMrXH2cxmYF3IVljj/PUmlexOHiI1Qxu/UPU+H2nDMOrqmgR8ezwoHe6r6V htjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u17si956320pgg.204.2019.09.04.23.47.05; Wed, 04 Sep 2019 23:47:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731501AbfIEGUK (ORCPT + 99 others); Thu, 5 Sep 2019 02:20:10 -0400 Received: from mail3-relais-sop.national.inria.fr ([192.134.164.104]:51891 "EHLO mail3-relais-sop.national.inria.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731251AbfIEGUG (ORCPT ); Thu, 5 Sep 2019 02:20:06 -0400 X-IronPort-AV: E=Sophos;i="5.64,469,1559512800"; d="scan'208";a="318334487" Received: from abo-12-105-68.mrs.modulonet.fr (HELO hadrien) ([85.68.105.12]) by mail3-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 05 Sep 2019 08:20:03 +0200 Date: Thu, 5 Sep 2019 08:20:02 +0200 (CEST) From: Julia Lawall X-X-Sender: jll@hadrien To: Denis Efremov cc: Coccinelle , Gilles Muller , Nicolas Palix , Michal Marek , linux-kernel@vger.kernel.org, tacingiht@gmail.com Subject: Re: [RFC PATCH] coccinelle: check for integer overflow in binary search In-Reply-To: <20190904221223.5281-1-efremov@linux.com> Message-ID: References: <20190904221223.5281-1-efremov@linux.com> User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 5 Sep 2019, Denis Efremov wrote: > This is an RFC. I will resend the patch after feedback. Currently > I'm preparing big patchset with bsearch warnings fixed. The rule will > be a part of this patchset if it will be considered good enough for > checking. > > There is a known integer overflow error [1] in the binary search > algorithm. Google faced it in 2006 [2]. This rule checks midpoint > calculation in binary search for overflow, i.e., (l + h) / 2. > Not every match is an actual error since the array could be small > enough. However, a custom implementation of binary search is > error-prone and it's better to use the library function (lib/bsearch.c) > or to apply defensive programming for midpoint calculation. > > [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues > [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html > > Signed-off-by: Denis Efremov > --- > scripts/coccinelle/misc/bsearch.cocci | 80 +++++++++++++++++++++++++++ > 1 file changed, 80 insertions(+) > create mode 100644 scripts/coccinelle/misc/bsearch.cocci > > diff --git a/scripts/coccinelle/misc/bsearch.cocci b/scripts/coccinelle/misc/bsearch.cocci > new file mode 100644 > index 000000000000..a99d9a8d3ee5 > --- /dev/null > +++ b/scripts/coccinelle/misc/bsearch.cocci > @@ -0,0 +1,80 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/// Check midpoint calculation in binary search algorithm for integer overflow > +/// error [1]. Google faced it in 2006 [2]. Not every match is an actual error > +/// since the array can be small enough. However, a custom implementation of > +/// binary search is error-prone and it's better to use the library function > +/// (lib/bsearch.c) or to apply defensive programming for midpoint calculation. > +/// > +/// [1] https://en.wikipedia.org/wiki/Binary_search_algorithm#Implementation_issues > +/// [2] https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html > +// > +// Confidence: Medium > +// Copyright: (C) 2019 Denis Efremov, ISPRAS > +// Comments: > +// Options: --no-includes --include-headers > + > +virtual report > +virtual org > + > +@r depends on org || report@ > +identifier l, h, m; > +statement S; > +position p; > +// to match 1 in << > +// to match 2 in / > +// Can't use exact values, e.g. 2, because it fails to match 2L. > +// TODO: Is there an isomorphism for 2, 2L, 2U, 2UL, 2ULL, etc? > +constant c; As far as I can see, you aren't checking for 2 at all at the moment? You should be able to say constant c = {2, 2L, etc};. Actually, we do consider several variants of 0, so it could be reasonable to allow eg 2 to match other variants as well. > +// TODO: Is there an isomorphism for (a / 2) === (a >> 1)? No. I'm not sure it's common enough to be worth it. But it could be added if it seems like a good idea. > +@@ > +( > + while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)) { > + ... > +( > + ((l + h)@p / c) > +| > + ((l + h)@p >> c) > +) > + ... > + } > +//TODO: Is it possible to match "do {} while ();" loops? Not at the moment. There is some work on it, but it is not complete (adding Evan in CC). julia > +// | > +// do { > +// ... > +// ( > +// ((l + h)@p / c) > +// | > +// ((l + h)@p >> c) > +// ) > +// ... > +// } while (\(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\)); > +| > + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); > + m = \(((l + h)@p / c)\|((l + h)@p >> c)\)) S > +| > + for (...; \(l < h\|l <= h\|(h - l) > 1\|(l + 1) < h\|l < (h - 1)\); ...) { > + ... > +( > + ((l + h)@p / c) > +| > + ((l + h)@p >> c) > +) > + ... > + } > +) > + > +@script:python depends on org@ > +p << r.p; > +@@ > + > +coccilib.org.print_todo(p[0], "WARNING opportunity for bsearch() or 'm = l + (h - l) / 2;'") > + > +@script:python depends on report@ > +p << r.p; > +@@ > + > +msg="WARNING: custom implementation of bsearch is error-prone. " > +msg+="Consider using lib/bsearch.c or fix the midpoint calculation " > +msg+="as 'm = l + (h - l) / 2;' to prevent the arithmetic overflow." > +coccilib.report.print_report(p[0], msg) > + > -- > 2.21.0 > >