2019-08-06 21:15:23

by Nathan Huckleberry

[permalink] [raw]
Subject: [RFC PATCH 1/2] Add clang-tidy and static analyzer support to makefile

Signed-off-by: Nathan Huckleberry <[email protected]>
---
These patches add clang-tidy and the clang static-analyzer as make
targets. The goal of these patches is to make static analysis tools
usable and extendable by any developer or researcher who is familiar
with basic c++.

The current static analysis tools require intimate knowledge of the internal
workings of the static analysis. Clang-tidy and the clang static analyzers
expose an easy to use api and allow users unfamiliar with clang to
write new checks with relative ease.

===Clang-tidy===

Clang-tidy is an easily extendable 'linter' that runs on the AST.
Clang-tidy checks are easy to write and understand. A check consists of
two parts, a matcher and a checker. The matcher is created using a
domain specific language that acts on the AST
(https://clang.llvm.org/docs/LibASTMatchersReference.html). When AST
nodes are found by the matcher a callback is made to the checker. The
checker can then execute additional checks and issue warnings.

Here is an example clang-tidy check to report functions that have calls
to local_irq_disable without calls to local_irq_enable and vice-versa.
Functions flagged with __attribute((annotation("ignore_irq_balancing")))
are ignored for analysis.

The full patch can be found here (https://reviews.llvm.org/D65828)

```
void IrqUnbalancedCheck::registerMatchers(MatchFinder *Finder) {
// finds calls to "arch_local_irq_disable" in a function body
auto disable =
forEachDescendant(
callExpr(
hasDeclaration(
namedDecl(
hasName("arch_local_irq_disable")))).bind("disable"));

// finds calls to "arch_local_irq_enable" in a function body
auto enable =
forEachDescendant(
callExpr(
hasDeclaration(
namedDecl(
hasName("arch_local_irq_enable")))).bind("enable"));

// Looks for function body that has the following property:
// ((disable && !enable) || (enable && !disable))
auto matcher = functionDecl(
anyOf(allOf(disable, unless(enable)), allOf(enable, unless(disable))));

Finder->addMatcher(matcher.bind("func"), this);
}

std::string annotation = "ignore_irq_balancing";
void IrqUnbalancedCheck::check(const MatchFinder::MatchResult &Result) {
const auto *MatchedDecl = Result.Nodes.getNodeAs<FunctionDecl>("func");
const auto *DisableCall = Result.Nodes.getNodeAs<CallExpr>("disable");
const auto *EnableCall = Result.Nodes.getNodeAs<CallExpr>("enable");

// If the function has __attribute((annotate("ignore_irq_balancing"))
for (const auto *Attr : MatchedDecl->attrs()) {
if (Attr->getKind() == clang::attr::Annotate) {
if(dyn_cast<AnnotateAttr>(Attr)->getAnnotation().str() == annotation) {
return;
}
}
}

if(EnableCall) {
diag(MatchedDecl->getLocation(), "call to 'enable_local_irq' without 'disable_local_irq' in %0 ")
<< MatchedDecl;
diag(EnableCall->getBeginLoc(), "call to 'enable_local_irq'", DiagnosticIDs::Note)
<< MatchedDecl;
}

if(DisableCall) {
diag(MatchedDecl->getLocation(), "call to 'disable_local_irq' without 'enable_local_irq' in %0 ")
<< MatchedDecl;
diag(DisableCall->getBeginLoc(), "call to 'disable_local_irq'", DiagnosticIDs::Note)
<< MatchedDecl;
}
}
```

===Clang static analyzer===

The clang static analyzer is a more powerful static analysis tool that
uses symbolic execution to find bugs. Currently there is a check that
looks for potential security bugs from invalid uses of kmalloc and
kfree. There are several more general purpose checks that are useful for
the kernel.

The clang static analyzer is well documented and designed to be
extensible.
(https://clang-analyzer.llvm.org/checker_dev_manual.html)
(https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf)


Why add clang-tidy and the clang static analyzer when other static
analyzers are already in the kernel?

The main draw of the clang tools is how accessible they are. The clang
documentation is very nice and these tools are built specifically to be
easily extendable by any developer. They provide an accessible method of
bug-finding and research to people who are not overly familiar with the
kernel codebase.

Makefile | 3 ++
scripts/clang-tools/Makefile.clang-tools | 35 ++++++++++++++
.../{ => clang-tools}/gen_compile_commands.py | 0
scripts/clang-tools/parse_compile_commands.py | 47 +++++++++++++++++++
4 files changed, 85 insertions(+)
create mode 100644 scripts/clang-tools/Makefile.clang-tools
rename scripts/{ => clang-tools}/gen_compile_commands.py (100%)
create mode 100755 scripts/clang-tools/parse_compile_commands.py

diff --git a/Makefile b/Makefile
index fabc127d127f..49f1d3fa48a8 100644
--- a/Makefile
+++ b/Makefile
@@ -709,6 +709,7 @@ KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)

include scripts/Makefile.kcov
include scripts/Makefile.gcc-plugins
+include scripts/clang-tools/Makefile.clang-tools

ifdef CONFIG_READABLE_ASM
# Disable optimizations that make assembler listings hard to read.
@@ -1470,6 +1471,8 @@ help:
@echo ' headers_check - Sanity check on exported headers'
@echo ' headerdep - Detect inclusion cycles in headers'
@echo ' coccicheck - Check with Coccinelle'
+ @echo ' clang-analyzer - Check with clang static analyzer'
+ @echo ' clang-tidy - Check with clang-tidy'
@echo ''
@echo 'Kernel selftest:'
@echo ' kselftest - Build and run kernel selftest (run as root)'
diff --git a/scripts/clang-tools/Makefile.clang-tools b/scripts/clang-tools/Makefile.clang-tools
new file mode 100644
index 000000000000..0adb6df20777
--- /dev/null
+++ b/scripts/clang-tools/Makefile.clang-tools
@@ -0,0 +1,35 @@
+# SPDX-License-Identifier: GPL-2.0
+PHONY += clang-tidy
+HAS_PARALLEL := $(shell (parallel --version 2> /dev/null) | grep 'GNU parallel' 2> /dev/null)
+clang-tidy:
+ifdef CONFIG_CC_IS_CLANG
+ $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
+ifdef HAS_PARALLEL
+ #Xargs interleaves multiprocessed output. GNU Parallel does not.
+ scripts/clang-tools/parse_compile_commands.py compile_commands.json \
+ | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,linuxkernel-*" {}'
+else
+ @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
+ scripts/clang-tools/parse_compile_commands.py compile_commands.json \
+ | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,linuxkernel-*' @"
+endif
+else
+ $(error Clang-tidy requires CC=clang)
+endif
+
+PHONY += clang-analyzer
+clang-analyzer:
+ifdef CONFIG_CC_IS_CLANG
+ $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
+ifdef HAS_PARALLEL
+ #Xargs interleaves multiprocessed output. GNU Parallel does not.
+ scripts/clang-tools/parse_compile_commands.py compile_commands.json \
+ | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,clang-analyzer-*" {}'
+else
+ @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
+ scripts/clang-tools/parse_compile_commands.py compile_commands.json \
+ | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,clang-analyzer-*' @"
+endif
+else
+ $(error Clang-analyzer requires CC=clang)
+endif
diff --git a/scripts/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
similarity index 100%
rename from scripts/gen_compile_commands.py
rename to scripts/clang-tools/gen_compile_commands.py
diff --git a/scripts/clang-tools/parse_compile_commands.py b/scripts/clang-tools/parse_compile_commands.py
new file mode 100755
index 000000000000..d6bc1bf9951e
--- /dev/null
+++ b/scripts/clang-tools/parse_compile_commands.py
@@ -0,0 +1,47 @@
+#!/usr/bin/env python
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (C) Google LLC, 2019
+#
+# Author: Nathan Huckleberry <[email protected]>
+#
+"""A helper routine for make clang-tidy to parse compile_commands.json."""
+
+import argparse
+import json
+import logging
+import os
+import re
+
+def parse_arguments():
+ """Set up and parses command-line arguments.
+ Returns:
+ file_path: Path to compile_commands.json file
+ """
+ usage = """Parse a compilation database and return a list of files
+ included in the database"""
+ parser = argparse.ArgumentParser(description=usage)
+
+ file_path_help = ('Path to the compilation database to parse')
+ parser.add_argument('file', type=str, help=file_path_help)
+
+ args = parser.parse_args()
+
+ return args.file
+
+
+def main():
+ filename = parse_arguments()
+
+ #Read JSON data into the datastore variable
+ if filename:
+ with open(filename, 'r') as f:
+ datastore = json.load(f)
+
+ #Use the new datastore datastructure
+ for entry in datastore:
+ print(entry['file'])
+
+
+if __name__ == '__main__':
+ main()
--
2.22.0.709.g102302147b-goog


2019-08-06 21:15:31

by Nathan Huckleberry

[permalink] [raw]
Subject: [RFC PATCH 2/2] Add clang-tidy irq-balancing annotations for arm64

This patch is not intended to be merged. It simply shows the scale of
the effort involved in applying annotations in the case of intentionally
unbalanced calls to local_irq_disable.

Signed-off-by: Nathan Huckleberry <[email protected]>
---
arch/arm64/kernel/debug-monitors.c | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/kernel/process.c | 4 ++++
arch/arm64/kernel/smp.c | 1 +
drivers/tty/sysrq.c | 1 +
include/linux/compiler_attributes.h | 7 +++++++
init/main.c | 1 +
kernel/irq/handle.c | 1 +
kernel/power/suspend.c | 2 ++
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 1 +
kernel/sched/idle.c | 6 ++++++
kernel/stop_machine.c | 1 +
mm/migrate.c | 1 +
net/core/dev.c | 1 +
15 files changed, 30 insertions(+)

diff --git a/arch/arm64/kernel/debug-monitors.c b/arch/arm64/kernel/debug-monitors.c
index f8719bd30850..13552daeadd7 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -221,6 +221,7 @@ static int call_step_hook(struct pt_regs *regs, unsigned int esr)
}
NOKPROBE_SYMBOL(call_step_hook);

+__unbalanced_irq
static void send_user_sigtrap(int si_code)
{
struct pt_regs *regs = current_pt_regs();
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index 0df8493624e0..7880c3d336d5 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -258,6 +258,7 @@ static void machine_kexec_mask_interrupts(void)
/**
* machine_crash_shutdown - shutdown non-crashing cpus and save registers
*/
+__unbalanced_irq
void machine_crash_shutdown(struct pt_regs *regs)
{
local_irq_disable();
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 9856395ccdb7..30cec11dc82b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -111,6 +111,7 @@ void cpu_do_idle(void)
/*
* This is our default idle handler.
*/
+__unbalanced_irq
void arch_cpu_idle(void)
{
/*
@@ -149,6 +150,7 @@ void machine_shutdown(void)
* activity (executing tasks, handling interrupts). smp_send_stop()
* achieves this.
*/
+__unbalanced_irq
void machine_halt(void)
{
local_irq_disable();
@@ -162,6 +164,7 @@ void machine_halt(void)
* achieves this. When the system power is turned off, it will take all CPUs
* with it.
*/
+__unbalanced_irq
void machine_power_off(void)
{
local_irq_disable();
@@ -179,6 +182,7 @@ void machine_power_off(void)
* executing pre-reset code, and using RAM that the primary CPU's code wishes
* to use. Implementing such co-ordination would be essentially impossible.
*/
+__unbalanced_irq
void machine_restart(char *cmd)
{
/* Disable interrupts first */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 6dcf9607d770..78da6f7b5bcd 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -852,6 +852,7 @@ static void ipi_cpu_stop(unsigned int cpu)
static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0);
#endif

+__unbalanced_irq
static void ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs *regs)
{
#ifdef CONFIG_KEXEC_CORE
diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 573b2055173c..53bac6cb39c3 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -146,6 +146,7 @@ static struct sysrq_key_op sysrq_crash_op = {
.enable_mask = SYSRQ_ENABLE_DUMP,
};

+__unbalanced_irq
static void sysrq_handle_reboot(int key)
{
lockdep_off();
diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h
index 6b318efd8a74..e2aaff9a2017 100644
--- a/include/linux/compiler_attributes.h
+++ b/include/linux/compiler_attributes.h
@@ -253,4 +253,11 @@
*/
#define __weak __attribute__((__weak__))

+
+#if __has_attribute(__annotate__)
+# define __unbalanced_irq __attribute__((annotate("ignore_irq_balancing")))
+#else
+# define __unbalanced_irq
+#endif
+
#endif /* __LINUX_COMPILER_ATTRIBUTES_H */
diff --git a/init/main.c b/init/main.c
index 66a196c5e4c3..360a2714afe3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -902,6 +902,7 @@ static inline void do_trace_initcall_finish(initcall_t fn, int ret)
}
#endif /* !TRACEPOINTS_ENABLED */

+__unbalanced_irq
int __init_or_module do_one_initcall(initcall_t fn)
{
int count = preempt_count();
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index a4ace611f47f..a74309c071b7 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -134,6 +134,7 @@ void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action)
wake_up_process(action->thread);
}

+__unbalanced_irq
irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int *flags)
{
irqreturn_t retval = IRQ_NONE;
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 096211299c07..41bc4b9b4f0e 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -380,12 +380,14 @@ static int suspend_prepare(suspend_state_t state)
}

/* default implementation */
+__unbalanced_irq
void __weak arch_suspend_disable_irqs(void)
{
local_irq_disable();
}

/* default implementation */
+__unbalanced_irq
void __weak arch_suspend_enable_irqs(void)
{
local_irq_enable();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 874c427742a9..465d4ef60e1a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3366,6 +3366,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
*
* WARNING: must be called with preemption disabled!
*/
+__unbalanced_irq
static void __sched notrace __schedule(bool preempt)
{
struct task_struct *prev, *next;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f5e528..9f147a2239ac 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9334,6 +9334,7 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
* least 1 task to be running on each physical CPU where possible, and
* avoids physical / logical imbalances.
*/
+__unbalanced_irq
static int active_load_balance_cpu_stop(void *data)
{
struct rq *busiest_rq = data;
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 80940939b733..b2174f71f1aa 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,6 +52,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

+__unbalanced_irq
static noinline int __cpuidle cpu_idle_poll(void)
{
rcu_idle_enter();
@@ -74,6 +75,7 @@ void __weak arch_cpu_idle_prepare(void) { }
void __weak arch_cpu_idle_enter(void) { }
void __weak arch_cpu_idle_exit(void) { }
void __weak arch_cpu_idle_dead(void) { }
+__unbalanced_irq
void __weak arch_cpu_idle(void)
{
cpu_idle_force_poll = 1;
@@ -85,6 +87,7 @@ void __weak arch_cpu_idle(void)
*
* To use when the cpuidle framework cannot be used.
*/
+__unbalanced_irq
void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
@@ -96,6 +99,7 @@ void __cpuidle default_idle_call(void)
}
}

+__unbalanced_irq
static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
int next_state)
{
@@ -126,6 +130,7 @@ static int call_cpuidle(struct cpuidle_driver *drv, struct cpuidle_device *dev,
* set, and it returns with polling set. If it ever stops polling, it
* must clear the polling bit.
*/
+__unbalanced_irq
static void cpuidle_idle_call(void)
{
struct cpuidle_device *dev = cpuidle_get_device();
@@ -222,6 +227,7 @@ static void cpuidle_idle_call(void)
*
* Called with polling cleared.
*/
+__unbalanced_irq
static void do_idle(void)
{
int cpu = smp_processor_id();
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 2b5a6754646f..1ff5f63b5f85 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -178,6 +178,7 @@ static void ack_state(struct multi_stop_data *msdata)
}

/* This is the cpu_stop function which stops the CPU. */
+__unbalanced_irq
static int multi_cpu_stop(void *data)
{
struct multi_stop_data *msdata = data;
diff --git a/mm/migrate.c b/mm/migrate.c
index f2ecc2855a12..bf2fd8e2c0f4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -396,6 +396,7 @@ static int expected_page_refs(struct address_space *mapping, struct page *page)
* 2 for pages with a mapping
* 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
*/
+__unbalanced_irq
int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page, enum migrate_mode mode,
int extra_count)
diff --git a/net/core/dev.c b/net/core/dev.c
index d6edd218babd..04906dc6bd00 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5885,6 +5885,7 @@ static void net_rps_send_ipi(struct softnet_data *remsd)
* net_rps_action_and_irq_enable sends any pending IPI's for rps.
* Note: called with local irq disabled, but exits with local irq enabled.
*/
+__unbalanced_irq
static void net_rps_action_and_irq_enable(struct softnet_data *sd)
{
#ifdef CONFIG_RPS
--
2.22.0.709.g102302147b-goog

2019-08-29 19:19:15

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Add clang-tidy and static analyzer support to makefile

[ Added kbuild and Masahiro to CC ... ]

On Tue, Aug 06, 2019 at 02:10:46PM -0700, Nathan Huckleberry wrote:
> Signed-off-by: Nathan Huckleberry <[email protected]>
> ---

These two lines should be at the end of your commit log. :)

> These patches add clang-tidy and the clang static-analyzer as make
> targets. The goal of these patches is to make static analysis tools
> usable and extendable by any developer or researcher who is familiar
> with basic c++.
>
> The current static analysis tools require intimate knowledge of the internal
> workings of the static analysis. Clang-tidy and the clang static analyzers
> expose an easy to use api and allow users unfamiliar with clang to
> write new checks with relative ease.
>
> ===Clang-tidy===
>
> Clang-tidy is an easily extendable 'linter' that runs on the AST.
> Clang-tidy checks are easy to write and understand. A check consists of
> two parts, a matcher and a checker. The matcher is created using a
> domain specific language that acts on the AST
> (https://clang.llvm.org/docs/LibASTMatchersReference.html). When AST
> nodes are found by the matcher a callback is made to the checker. The
> checker can then execute additional checks and issue warnings.
>
> Here is an example clang-tidy check to report functions that have calls
> to local_irq_disable without calls to local_irq_enable and vice-versa.
> Functions flagged with __attribute((annotation("ignore_irq_balancing")))
> are ignored for analysis.
>
> The full patch can be found here (https://reviews.llvm.org/D65828)
>
> ```
> void IrqUnbalancedCheck::registerMatchers(MatchFinder *Finder) {
> // finds calls to "arch_local_irq_disable" in a function body
> auto disable =
> forEachDescendant(
> callExpr(
> hasDeclaration(
> namedDecl(
> hasName("arch_local_irq_disable")))).bind("disable"));
>
> // finds calls to "arch_local_irq_enable" in a function body
> auto enable =
> forEachDescendant(
> callExpr(
> hasDeclaration(
> namedDecl(
> hasName("arch_local_irq_enable")))).bind("enable"));
>
> // Looks for function body that has the following property:
> // ((disable && !enable) || (enable && !disable))
> auto matcher = functionDecl(
> anyOf(allOf(disable, unless(enable)), allOf(enable, unless(disable))));
>
> Finder->addMatcher(matcher.bind("func"), this);
> }
>
> std::string annotation = "ignore_irq_balancing";
> void IrqUnbalancedCheck::check(const MatchFinder::MatchResult &Result) {
> const auto *MatchedDecl = Result.Nodes.getNodeAs<FunctionDecl>("func");
> const auto *DisableCall = Result.Nodes.getNodeAs<CallExpr>("disable");
> const auto *EnableCall = Result.Nodes.getNodeAs<CallExpr>("enable");
>
> // If the function has __attribute((annotate("ignore_irq_balancing"))
> for (const auto *Attr : MatchedDecl->attrs()) {
> if (Attr->getKind() == clang::attr::Annotate) {
> if(dyn_cast<AnnotateAttr>(Attr)->getAnnotation().str() == annotation) {
> return;
> }
> }
> }
>
> if(EnableCall) {
> diag(MatchedDecl->getLocation(), "call to 'enable_local_irq' without 'disable_local_irq' in %0 ")
> << MatchedDecl;
> diag(EnableCall->getBeginLoc(), "call to 'enable_local_irq'", DiagnosticIDs::Note)
> << MatchedDecl;
> }
>
> if(DisableCall) {
> diag(MatchedDecl->getLocation(), "call to 'disable_local_irq' without 'enable_local_irq' in %0 ")
> << MatchedDecl;
> diag(DisableCall->getBeginLoc(), "call to 'disable_local_irq'", DiagnosticIDs::Note)
> << MatchedDecl;
> }
> }
> ```
>
> ===Clang static analyzer===
>
> The clang static analyzer is a more powerful static analysis tool that
> uses symbolic execution to find bugs. Currently there is a check that
> looks for potential security bugs from invalid uses of kmalloc and
> kfree. There are several more general purpose checks that are useful for
> the kernel.
>
> The clang static analyzer is well documented and designed to be
> extensible.
> (https://clang-analyzer.llvm.org/checker_dev_manual.html)
> (https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf)
>
>
> Why add clang-tidy and the clang static analyzer when other static
> analyzers are already in the kernel?
>
> The main draw of the clang tools is how accessible they are. The clang
> documentation is very nice and these tools are built specifically to be
> easily extendable by any developer. They provide an accessible method of
> bug-finding and research to people who are not overly familiar with the
> kernel codebase.
>

^ i.e. they should be right here...

> Makefile | 3 ++
> scripts/clang-tools/Makefile.clang-tools | 35 ++++++++++++++
> .../{ => clang-tools}/gen_compile_commands.py | 0
> scripts/clang-tools/parse_compile_commands.py | 47 +++++++++++++++++++
> 4 files changed, 85 insertions(+)
> create mode 100644 scripts/clang-tools/Makefile.clang-tools
> rename scripts/{ => clang-tools}/gen_compile_commands.py (100%)
> create mode 100755 scripts/clang-tools/parse_compile_commands.py
>
> diff --git a/Makefile b/Makefile
> index fabc127d127f..49f1d3fa48a8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -709,6 +709,7 @@ KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)
>
> include scripts/Makefile.kcov
> include scripts/Makefile.gcc-plugins
> +include scripts/clang-tools/Makefile.clang-tools
>
> ifdef CONFIG_READABLE_ASM
> # Disable optimizations that make assembler listings hard to read.
> @@ -1470,6 +1471,8 @@ help:
> @echo ' headers_check - Sanity check on exported headers'
> @echo ' headerdep - Detect inclusion cycles in headers'
> @echo ' coccicheck - Check with Coccinelle'
> + @echo ' clang-analyzer - Check with clang static analyzer'
> + @echo ' clang-tidy - Check with clang-tidy'

I think this is great; more people will be able to run these tests.

> @echo ''
> @echo 'Kernel selftest:'
> @echo ' kselftest - Build and run kernel selftest (run as root)'
> diff --git a/scripts/clang-tools/Makefile.clang-tools b/scripts/clang-tools/Makefile.clang-tools
> new file mode 100644
> index 000000000000..0adb6df20777
> --- /dev/null
> +++ b/scripts/clang-tools/Makefile.clang-tools
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0
> +PHONY += clang-tidy
> +HAS_PARALLEL := $(shell (parallel --version 2> /dev/null) | grep 'GNU parallel' 2> /dev/null)
> +clang-tidy:
> +ifdef CONFIG_CC_IS_CLANG
> + $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
> +ifdef HAS_PARALLEL

Is it worth falling back to xargs? Are there builders where clang-tidy
or clang-analyzer are installed but parallel isn't? (i.e. it might just
be better to simply require parallel.) Note that there's no test for
python3 -- we just try to run it. :)

> + #Xargs interleaves multiprocessed output. GNU Parallel does not.
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,linuxkernel-*" {}'
> +else
> + @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,linuxkernel-*' @"
> +endif
> +else
> + $(error Clang-tidy requires CC=clang)
> +endif
> +
> +PHONY += clang-analyzer
> +clang-analyzer:
> +ifdef CONFIG_CC_IS_CLANG
> + $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
> +ifdef HAS_PARALLEL
> + #Xargs interleaves multiprocessed output. GNU Parallel does not.
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,clang-analyzer-*" {}'
> +else
> + @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,clang-analyzer-*' @"
> +endif
> +else
> + $(error Clang-analyzer requires CC=clang)
> +endif
> diff --git a/scripts/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> similarity index 100%
> rename from scripts/gen_compile_commands.py
> rename to scripts/clang-tools/gen_compile_commands.py
> diff --git a/scripts/clang-tools/parse_compile_commands.py b/scripts/clang-tools/parse_compile_commands.py
> new file mode 100755
> index 000000000000..d6bc1bf9951e
> --- /dev/null
> +++ b/scripts/clang-tools/parse_compile_commands.py
> @@ -0,0 +1,47 @@
> +#!/usr/bin/env python
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) Google LLC, 2019
> +#
> +# Author: Nathan Huckleberry <[email protected]>
> +#
> +"""A helper routine for make clang-tidy to parse compile_commands.json."""
> +
> +import argparse
> +import json
> +import logging
> +import os
> +import re
> +
> +def parse_arguments():
> + """Set up and parses command-line arguments.
> + Returns:
> + file_path: Path to compile_commands.json file
> + """
> + usage = """Parse a compilation database and return a list of files
> + included in the database"""
> + parser = argparse.ArgumentParser(description=usage)
> +
> + file_path_help = ('Path to the compilation database to parse')
> + parser.add_argument('file', type=str, help=file_path_help)
> +
> + args = parser.parse_args()
> +
> + return args.file
> +
> +
> +def main():
> + filename = parse_arguments()
> +
> + #Read JSON data into the datastore variable
> + if filename:
> + with open(filename, 'r') as f:
> + datastore = json.load(f)
> +
> + #Use the new datastore datastructure
> + for entry in datastore:
> + print(entry['file'])
> +
> +
> +if __name__ == '__main__':
> + main()

And this is nice to have because now there is a real consumer of
gen_compile_commands.py.

--
Kees Cook

2019-09-03 16:25:32

by Masahiro Yamada

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] Add clang-tidy and static analyzer support to makefile

On Wed, Aug 7, 2019 at 6:13 AM Nathan Huckleberry <[email protected]> wrote:
>
> Signed-off-by: Nathan Huckleberry <[email protected]>
> ---
> These patches add clang-tidy and the clang static-analyzer as make
> targets. The goal of these patches is to make static analysis tools
> usable and extendable by any developer or researcher who is familiar
> with basic c++.
>
> The current static analysis tools require intimate knowledge of the internal
> workings of the static analysis. Clang-tidy and the clang static analyzers
> expose an easy to use api and allow users unfamiliar with clang to
> write new checks with relative ease.
>
> ===Clang-tidy===
>
> Clang-tidy is an easily extendable 'linter' that runs on the AST.
> Clang-tidy checks are easy to write and understand. A check consists of
> two parts, a matcher and a checker. The matcher is created using a
> domain specific language that acts on the AST
> (https://clang.llvm.org/docs/LibASTMatchersReference.html). When AST
> nodes are found by the matcher a callback is made to the checker. The
> checker can then execute additional checks and issue warnings.
>
> Here is an example clang-tidy check to report functions that have calls
> to local_irq_disable without calls to local_irq_enable and vice-versa.
> Functions flagged with __attribute((annotation("ignore_irq_balancing")))
> are ignored for analysis.
>
> The full patch can be found here (https://reviews.llvm.org/D65828)
>
> ```
> void IrqUnbalancedCheck::registerMatchers(MatchFinder *Finder) {
> // finds calls to "arch_local_irq_disable" in a function body
> auto disable =
> forEachDescendant(
> callExpr(
> hasDeclaration(
> namedDecl(
> hasName("arch_local_irq_disable")))).bind("disable"));
>
> // finds calls to "arch_local_irq_enable" in a function body
> auto enable =
> forEachDescendant(
> callExpr(
> hasDeclaration(
> namedDecl(
> hasName("arch_local_irq_enable")))).bind("enable"));
>
> // Looks for function body that has the following property:
> // ((disable && !enable) || (enable && !disable))
> auto matcher = functionDecl(
> anyOf(allOf(disable, unless(enable)), allOf(enable, unless(disable))));
>
> Finder->addMatcher(matcher.bind("func"), this);
> }
>
> std::string annotation = "ignore_irq_balancing";
> void IrqUnbalancedCheck::check(const MatchFinder::MatchResult &Result) {
> const auto *MatchedDecl = Result.Nodes.getNodeAs<FunctionDecl>("func");
> const auto *DisableCall = Result.Nodes.getNodeAs<CallExpr>("disable");
> const auto *EnableCall = Result.Nodes.getNodeAs<CallExpr>("enable");
>
> // If the function has __attribute((annotate("ignore_irq_balancing"))
> for (const auto *Attr : MatchedDecl->attrs()) {
> if (Attr->getKind() == clang::attr::Annotate) {
> if(dyn_cast<AnnotateAttr>(Attr)->getAnnotation().str() == annotation) {
> return;
> }
> }
> }
>
> if(EnableCall) {
> diag(MatchedDecl->getLocation(), "call to 'enable_local_irq' without 'disable_local_irq' in %0 ")
> << MatchedDecl;
> diag(EnableCall->getBeginLoc(), "call to 'enable_local_irq'", DiagnosticIDs::Note)
> << MatchedDecl;
> }
>
> if(DisableCall) {
> diag(MatchedDecl->getLocation(), "call to 'disable_local_irq' without 'enable_local_irq' in %0 ")
> << MatchedDecl;
> diag(DisableCall->getBeginLoc(), "call to 'disable_local_irq'", DiagnosticIDs::Note)
> << MatchedDecl;
> }
> }
> ```
>
> ===Clang static analyzer===
>
> The clang static analyzer is a more powerful static analysis tool that
> uses symbolic execution to find bugs. Currently there is a check that
> looks for potential security bugs from invalid uses of kmalloc and
> kfree. There are several more general purpose checks that are useful for
> the kernel.
>
> The clang static analyzer is well documented and designed to be
> extensible.
> (https://clang-analyzer.llvm.org/checker_dev_manual.html)
> (https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf)
>
>
> Why add clang-tidy and the clang static analyzer when other static
> analyzers are already in the kernel?
>
> The main draw of the clang tools is how accessible they are. The clang
> documentation is very nice and these tools are built specifically to be
> easily extendable by any developer. They provide an accessible method of
> bug-finding and research to people who are not overly familiar with the
> kernel codebase.
>
> Makefile | 3 ++
> scripts/clang-tools/Makefile.clang-tools | 35 ++++++++++++++
> .../{ => clang-tools}/gen_compile_commands.py | 0
> scripts/clang-tools/parse_compile_commands.py | 47 +++++++++++++++++++
> 4 files changed, 85 insertions(+)
> create mode 100644 scripts/clang-tools/Makefile.clang-tools
> rename scripts/{ => clang-tools}/gen_compile_commands.py (100%)
> create mode 100755 scripts/clang-tools/parse_compile_commands.py
>
> diff --git a/Makefile b/Makefile
> index fabc127d127f..49f1d3fa48a8 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -709,6 +709,7 @@ KBUILD_CFLAGS += $(call cc-option,--param=allow-store-data-races=0)
>
> include scripts/Makefile.kcov
> include scripts/Makefile.gcc-plugins
> +include scripts/clang-tools/Makefile.clang-tools
>
> ifdef CONFIG_READABLE_ASM
> # Disable optimizations that make assembler listings hard to read.
> @@ -1470,6 +1471,8 @@ help:
> @echo ' headers_check - Sanity check on exported headers'
> @echo ' headerdep - Detect inclusion cycles in headers'
> @echo ' coccicheck - Check with Coccinelle'
> + @echo ' clang-analyzer - Check with clang static analyzer'
> + @echo ' clang-tidy - Check with clang-tidy'
> @echo ''
> @echo 'Kernel selftest:'
> @echo ' kselftest - Build and run kernel selftest (run as root)'
> diff --git a/scripts/clang-tools/Makefile.clang-tools b/scripts/clang-tools/Makefile.clang-tools
> new file mode 100644
> index 000000000000..0adb6df20777
> --- /dev/null
> +++ b/scripts/clang-tools/Makefile.clang-tools
> @@ -0,0 +1,35 @@
> +# SPDX-License-Identifier: GPL-2.0
> +PHONY += clang-tidy
> +HAS_PARALLEL := $(shell (parallel --version 2> /dev/null) | grep 'GNU parallel' 2> /dev/null)
> +clang-tidy:
> +ifdef CONFIG_CC_IS_CLANG
> + $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
> +ifdef HAS_PARALLEL
> + #Xargs interleaves multiprocessed output. GNU Parallel does not.
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,linuxkernel-*" {}'
> +else
> + @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,linuxkernel-*' @"
> +endif
> +else
> + $(error Clang-tidy requires CC=clang)
> +endif


What is the benefit for wiring this to the Makefile?

Is there any drawback if you implement this in a small
independent shell script or something?


You are already adding a new python script
just for reading the json file.

I would invoke clang-{tidy,analyzer} from that python script.

If you unify these into a python script,
you can support the multiprocessing with a
python library ('import multiprocessing')
without GNU parallel.


Thanks.



> +PHONY += clang-analyzer
> +clang-analyzer:
> +ifdef CONFIG_CC_IS_CLANG
> + $(PYTHON3) scripts/clang-tools/gen_compile_commands.py
> +ifdef HAS_PARALLEL
> + #Xargs interleaves multiprocessed output. GNU Parallel does not.
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | parallel -k -j $(shell nproc) 'echo {} && clang-tidy -p . "-checks=-*,clang-analyzer-*" {}'
> +else
> + @echo "GNU parallel is not installed. Defaulting to non-parallelized runs"
> + scripts/clang-tools/parse_compile_commands.py compile_commands.json \
> + | xargs -L 1 -I@ sh -c "echo '@' && clang-tidy -p . '-checks=-*,clang-analyzer-*' @"
> +endif
> +else
> + $(error Clang-analyzer requires CC=clang)
> +endif
> diff --git a/scripts/gen_compile_commands.py b/scripts/clang-tools/gen_compile_commands.py
> similarity index 100%
> rename from scripts/gen_compile_commands.py
> rename to scripts/clang-tools/gen_compile_commands.py
> diff --git a/scripts/clang-tools/parse_compile_commands.py b/scripts/clang-tools/parse_compile_commands.py
> new file mode 100755
> index 000000000000..d6bc1bf9951e
> --- /dev/null
> +++ b/scripts/clang-tools/parse_compile_commands.py
> @@ -0,0 +1,47 @@
> +#!/usr/bin/env python
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (C) Google LLC, 2019
> +#
> +# Author: Nathan Huckleberry <[email protected]>
> +#
> +"""A helper routine for make clang-tidy to parse compile_commands.json."""
> +
> +import argparse
> +import json
> +import logging
> +import os
> +import re
> +
> +def parse_arguments():
> + """Set up and parses command-line arguments.
> + Returns:
> + file_path: Path to compile_commands.json file
> + """
> + usage = """Parse a compilation database and return a list of files
> + included in the database"""
> + parser = argparse.ArgumentParser(description=usage)
> +
> + file_path_help = ('Path to the compilation database to parse')
> + parser.add_argument('file', type=str, help=file_path_help)
> +
> + args = parser.parse_args()
> +
> + return args.file
> +
> +
> +def main():
> + filename = parse_arguments()
> +
> + #Read JSON data into the datastore variable
> + if filename:
> + with open(filename, 'r') as f:
> + datastore = json.load(f)
> +
> + #Use the new datastore datastructure
> + for entry in datastore:
> + print(entry['file'])
> +
> +
> +if __name__ == '__main__':
> + main()
> --
> 2.22.0.709.g102302147b-goog
>


--
Best Regards
Masahiro Yamada