2015-05-05 10:21:15

by Nicholas Mc Guire

[permalink] [raw]
Subject: [PATCH RFC] Coccinelle: Check for return not matching function signature

Check if the signature of a function and the return value type match. This
is currently not the case in more than 2300 functions. In many cases this
mismatch will have no side-effects but in some cases it may lead to hard
to locate problems - and for readability and code understanding it is also
helpful when types match.

Signed-off-by: Nicholas Mc Guire <[email protected]>
---

Did not see any false positives in a scan of 4.0-rc2 (but did not check
all results either).

Not sure if scripts/coccinelle/tests/ is the right place for this
It did not seem to trigger any false positives but in some cases (notably
void pointers) the warning might be unnecessary.

The output is a bit lengthy - not sure if that is too much but it seemed
useful to me to see the non-matching types explicitly in the warning
message.

Patch is against 4.1-rc2 (localversion-next is -next-20150505)

.../coccinelle/tests/signature_matches_ret.cocci | 31 ++++++++++++++++++++
1 file changed, 31 insertions(+)
create mode 100644 scripts/coccinelle/tests/signature_matches_ret.cocci

diff --git a/scripts/coccinelle/tests/signature_matches_ret.cocci b/scripts/coccinelle/tests/signature_matches_ret.cocci
new file mode 100644
index 0000000..2dd1086
--- /dev/null
+++ b/scripts/coccinelle/tests/signature_matches_ret.cocci
@@ -0,0 +1,31 @@
+// Find functions where return type and signature do not match
+// Comments:
+// Options: --no-includes --include-headers
+
+virtual context
+virtual org
+virtual report
+
+@match@
+identifier f,ret;
+position p;
+type T1,T2;
+@@
+
+T1 f(...) {
+ T2 ret;
+<+...
+* return@p ret
+;
+...+>
+}
+
+@script:python@
+p << match.p;
+fn << match.f;
+T1 << match.T1;
+T2 << match.T2;
+@@
+
+if T1 != T2:
+ print "%s:%s,%s WARNING: return of wrong type (%s != %s)" % (p[0].file,fn,p[0].line,T1,T2)
--
1.7.10.4


2015-05-05 12:33:30

by SF Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

> Check if the signature of a function and the return value type match.

Is this a task that is usually performed by a compiler?


> In many cases this mismatch will have no side-effects
> but in some cases it may lead to hard to locate problems

It is another software development challenge to find concrete
open issues there.


> - and for readability and code understanding it is also helpful
> when types match.

How would you like to check for compatible data types here?


> The output is a bit lengthy - not sure if that is too much
> but it seemed useful to me to see the non-matching types explicitly
> in the warning message.

How do you think about to import the result list into a database table?


> +if T1 != T2:
> + print "%s:%s,%s WARNING: return of wrong type (%s != %s)" % (p[0].file,fn,p[0].line,T1,T2)

Is such a check a bit too simple?

Regards,
Markus

2015-05-05 13:04:19

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

On Tue, 05 May 2015, SF Markus Elfring wrote:

> > Check if the signature of a function and the return value type match.
>
> Is this a task that is usually performed by a compiler?
>
>
> > In many cases this mismatch will have no side-effects
> > but in some cases it may lead to hard to locate problems
>
> It is another software development challenge to find concrete
> open issues there.
>
>
> > - and for readability and code understanding it is also helpful
> > when types match.
>
> How would you like to check for compatible data types here?
>

coccinelle knows the type so all you need to do is comare them in
phython .

>
> > The output is a bit lengthy - not sure if that is too much
> > but it seemed useful to me to see the non-matching types explicitly
> > in the warning message.
>
> How do you think about to import the result list into a database table?
>

working on that "re-cycling" your parameter count example
top 10:
488 ssize_t != int
195 int != unsigned int
183 long != int
113 int != u32
55 int != unsigned long
48 int != u8
45 int != u16
44 unsigned int != int
37 int != s32
30 int != long


>
> > +if T1 != T2:
> > + print "%s:%s,%s WARNING: return of wrong type (%s != %s)" % (p[0].file,fn,p[0].line,T1,T2)
>
> Is such a check a bit too simple?
>
Nop - why ?
Cocci "knwow" C so the assignment of types is reliable -
flaging a s32 != int is fine with respect to readability
even if they are technically the same.

thx!
hofrat

2015-05-05 13:29:33

by Julia Lawall

[permalink] [raw]
Subject: Re: [Cocci] [PATCH RFC] Coccinelle: Check for return not matching function signature

On Tue, 5 May 2015, Nicholas Mc Guire wrote:

> On Tue, 05 May 2015, SF Markus Elfring wrote:
>
> > > Check if the signature of a function and the return value type match.
> >
> > Is this a task that is usually performed by a compiler?
> >
> >
> > > In many cases this mismatch will have no side-effects
> > > but in some cases it may lead to hard to locate problems
> >
> > It is another software development challenge to find concrete
> > open issues there.
> >
> >
> > > - and for readability and code understanding it is also helpful
> > > when types match.
> >
> > How would you like to check for compatible data types here?
> >
>
> coccinelle knows the type so all you need to do is comare them in
> phython .
>
> >
> > > The output is a bit lengthy - not sure if that is too much
> > > but it seemed useful to me to see the non-matching types explicitly
> > > in the warning message.
> >
> > How do you think about to import the result list into a database table?
> >
>
> working on that "re-cycling" your parameter count example
> top 10:
> 488 ssize_t != int
> 195 int != unsigned int
> 183 long != int
> 113 int != u32
> 55 int != unsigned long
> 48 int != u8
> 45 int != u16
> 44 unsigned int != int
> 37 int != s32
> 30 int != long

Thanks for the specific results. That looks very useful.

julia

>
>
> >
> > > +if T1 != T2:
> > > + print "%s:%s,%s WARNING: return of wrong type (%s != %s)" % (p[0].file,fn,p[0].line,T1,T2)
> >
> > Is such a check a bit too simple?
> >
> Nop - why ?
> Cocci "knwow" C so the assignment of types is reliable -
> flaging a s32 != int is fine with respect to readability
> even if they are technically the same.
>
> thx!
> hofrat
> _______________________________________________
> Cocci mailing list
> [email protected]
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

2015-05-05 13:46:12

by SF Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

>> How do you think about to import the result list into a database table?
>
> working on that "re-cycling" your parameter count example
> top 10:
> 488 ssize_t != int
> 195 int != unsigned int
> 183 long != int

?

Would you like to provide a static source code analysis
directly in such a report format?
Will a drill-down into corresponding implementation details
become more interesting for the detection of further open issues?

Regards,
Markus

2015-05-05 16:07:43

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

> +@match@
> +identifier f,ret;
> +position p;
> +type T1,T2;
> +@@
> +
> +T1 f(...) {
> + T2 ret;
> +<+...
> +* return@p ret
> +;
> +...+>
> +}

Given the number of results, it may seem surprising, but I think that you
are actually missing a lot of results. Becaue you require that ret be the
first variable that is declared in the function. Also, you require that
ret be an identifier. If you want to keep the restriction about being an
identifier, you could put:

@match exists@
type T1,T2;
idexpression T2 ret;
identifier f;
@@

T1 f(...) {
<+...
return@p ret;
...+>
}

If you don't care about the identifier constraint, then you can just put
T2 ret. Note also the addition of exists. There is a problem if only one
path has this property. Another thing you can do is the following:

@match exists@
type T1,T2;
expression T1 ok;
idexpression T2 ret;
identifier f;
@@

T1 f(...) {
<+...
(
return ok;
|
return@p ret;
)
...+>
}

Then Coccinelle will find the cases where the types are wrong, rather than
requiring a test in python.

(I haven't tested any of this)

julia

2015-05-05 16:11:38

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

On Tue, 05 May 2015, Julia Lawall wrote:

> > +@match@
> > +identifier f,ret;
> > +position p;
> > +type T1,T2;
> > +@@
> > +
> > +T1 f(...) {
> > + T2 ret;
> > +<+...
> > +* return@p ret
> > +;
> > +...+>
> > +}
>
> Given the number of results, it may seem surprising, but I think that you
> are actually missing a lot of results. Becaue you require that ret be the
> first variable that is declared in the function. Also, you require that
> ret be an identifier. If you want to keep the restriction about being an
> identifier, you could put:
>
> @match exists@
> type T1,T2;
> idexpression T2 ret;
> identifier f;
> @@
>
> T1 f(...) {
> <+...
> return@p ret;
> ...+>
> }
>

this is depressing - I now like by wrong solution even more ...
unfortunately you are right - I missed most - its now at 25146

> If you don't care about the identifier constraint, then you can just put
> T2 ret. Note also the addition of exists. There is a problem if only one
> path has this property. Another thing you can do is the following:
>
> @match exists@
> type T1,T2;
idexpression T1 ok;
> idexpression T2 ret;
> identifier f;
position p;
> @@
>
> T1 f(...) {
> <+...
> (
> return ok;
> |
> return@p ret;
> )
> ...+>
> }
>
> Then Coccinelle will find the cases where the types are wrong, rather than
> requiring a test in python.
>
> (I haven't tested any of this)

also works - I had naively expected this to be faster - but it does not
seem to be.

will check results did not expect 10% of the kernel functions
to have missmatching return types in atleast one of their paths.

thx!
hofrat

2015-05-05 16:25:52

by SF Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

> Then Coccinelle will find the cases where the types are wrong, rather than
> requiring a test in python.

Do you need a minus character in the first text column of the SmPL scripts then
to mark corresponding update candidates (instead of the mentioned warning printing)?

Regards,
Markus

2015-05-05 21:25:54

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature



On Tue, 5 May 2015, Nicholas Mc Guire wrote:

> On Tue, 05 May 2015, Julia Lawall wrote:
>
> > > +@match@
> > > +identifier f,ret;
> > > +position p;
> > > +type T1,T2;
> > > +@@
> > > +
> > > +T1 f(...) {
> > > + T2 ret;
> > > +<+...
> > > +* return@p ret
> > > +;
> > > +...+>
> > > +}
> >
> > Given the number of results, it may seem surprising, but I think that you
> > are actually missing a lot of results. Becaue you require that ret be the
> > first variable that is declared in the function. Also, you require that
> > ret be an identifier. If you want to keep the restriction about being an
> > identifier, you could put:
> >
> > @match exists@
> > type T1,T2;
> > idexpression T2 ret;

I was think ing that you don't want expression in general, because for all
contansts that will give you int.

You can of course put return C; for constant metavariable C in the
disjunction to avoid that possibility.

julia

> > identifier f;
> > @@
> >
> > T1 f(...) {
> > <+...
> > return@p ret;
> > ...+>
> > }
> >
>
> this is depressing - I now like by wrong solution even more ...
> unfortunately you are right - I missed most - its now at 25146
>
> > If you don't care about the identifier constraint, then you can just put
> > T2 ret. Note also the addition of exists. There is a problem if only one
> > path has this property. Another thing you can do is the following:
> >
> > @match exists@
> > type T1,T2;
> idexpression T1 ok;
> > idexpression T2 ret;
> > identifier f;
> position p;
> > @@
> >
> > T1 f(...) {
> > <+...
> > (
> > return ok;
> > |
> > return@p ret;
> > )
> > ...+>
> > }
> >
> > Then Coccinelle will find the cases where the types are wrong, rather than
> > requiring a test in python.
> >
> > (I haven't tested any of this)
>
> also works - I had naively expected this to be faster - but it does not
> seem to be.
>
> will check results did not expect 10% of the kernel functions
> to have missmatching return types in atleast one of their paths.
>
> thx!
> hofrat
>

2015-05-05 21:46:52

by Julia Lawall

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

On Tue, 5 May 2015, SF Markus Elfring wrote:

> > Then Coccinelle will find the cases where the types are wrong, rather than
> > requiring a test in python.
>
> Do you need a minus character in the first text column of the SmPL scripts then
> to mark corresponding update candidates (instead of the mentioned warning printing)?

He has a *, as he should.

julia

2015-05-06 07:16:17

by SF Markus Elfring

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

> +virtual context
> +virtual org
> +virtual report

Where do you want to reuse these variables in your SmPL scripts?


> +@match@
> +identifier f,ret;
> +position p;
> +type T1,T2;
> +@@
> +
> +T1 f(...) {
> + T2 ret;

Will it be more helpful to mark only such variable declarations
where the specified data type should be reconsidered
by a minus character or an asterisk (instead of the following
return statement)?

> +<+...
> +* return@p ret
> +;
> +...+>
> +}


Regards,
Markus

2015-05-08 06:59:21

by Nicholas Mc Guire

[permalink] [raw]
Subject: Re: [PATCH RFC] Coccinelle: Check for return not matching function signature

On Tue, 05 May 2015, Julia Lawall wrote:

>
>
> On Tue, 5 May 2015, Nicholas Mc Guire wrote:
>
> > On Tue, 05 May 2015, Julia Lawall wrote:
> >
> > > > +@match@
> > > > +identifier f,ret;
> > > > +position p;
> > > > +type T1,T2;
> > > > +@@
> > > > +
> > > > +T1 f(...) {
> > > > + T2 ret;
> > > > +<+...
> > > > +* return@p ret
> > > > +;
> > > > +...+>
> > > > +}
> > >
> > > Given the number of results, it may seem surprising, but I think that you
> > > are actually missing a lot of results. Becaue you require that ret be the
> > > first variable that is declared in the function. Also, you require that
> > > ret be an identifier. If you want to keep the restriction about being an
> > > identifier, you could put:
> > >
> > > @match exists@
> > > type T1,T2;
> > > idexpression T2 ret;
>
> I was think ing that you don't want expression in general, because for all
> contansts that will give you int.
>
> You can of course put return C; for constant metavariable C in the
> disjunction to avoid that possibility.
>
looks a lot better and removed a lot of false positives - the main problem
now is managing classification of the kernels type "system" - seems like there
are atleast 5 ways to describe every type (except for enum) and infinitely
many possible assignments for ssize_t ...

here a little summary of the outputs - might be motivation to put some
quite simple scanners into mainline to catch such issues.

comparison of return types in functions to the functions signature for
kernel 4.1-rc2, glibc-2.9 and busybox 1.2.2.1 - no particular reason for
that glibc/busybox versions they just happend to be on my harddrive.

This is using the version that was fixed by Julia Lawal
<snip>
@match exists@
type T1,T2;
idexpression T1 ok;
idexpression T2 ret;
identifier f;
constant C;
position p;
@@

T1 f(...) {
<+...
(
return ok;
|
return C;
|
return@p ret;
)
...+>
}
<snip>

component Nr funcs != type %
kernel : 374600 10727 2.85
glibc : 9184 268 2.92
busybox : 3645 43 1.18

kernel glibc busybox criticality
wrong ? : 8 4 0 not sure
sign missmatch : 2279 30 9 critical
down sized : 435 49 4 critical
up sized : 910 20 3 ugly
declaration missmatch : 7095 165 27 wishlist

wrong: seems plain wrong like float assigned to int (did not check details yet)
sign missmatch: assigning signed types to unsiggned or vice versa
down sized: some form of possible truncation like u64 being assigned u32
up sized: non-critical as it was correct type and it fit
declaration missmatch: means that they were named differently s32/int

Some limitations:
The glibc runs produced some error cases (spatch level) that were ignored
for now e.g.:
<snip>
EXN:Failure("match: node 194: return ...[1,2,90] in rec_dirsearch reachable
by inconsistent control-flow paths")
<snip>
The kernel numbers are a bit inaccurate because not all types can be
checked reliably - e.g. when they are config dependent also due to the
enourmous type-"system" in the kernel not all assignments are sure
but that does not change the overall result.
I did not yet manage to automate the classification - just too many types
where its hard to say due to config dependencies - probably need to put
thos into a "don't know" category. Also all assignments of pointers of
any type on one side to void * on the other side was counted as legitimate.
Some results were mangled probably because of inacurate filtering resulting
in things like "_EXTERN_INLINE != mp_limb_t" just dropped those for now.

Conclusion:
* atleast the sign missmatch cases (2279) and potentially truncating
assignments (435) are problematic.
* the scripts needs a lot more cleanup in the classification of the reported
types to be useful
* probably not realistic to cleanup all currently present tupe mismatches
but scanning continuously and reporting before it goes into mainline or
integrating such a check in the routine submission process seems
worthwhile

Once the classifier is working properly I'll post the next version.

thx!
hofrat