On Mon, 22 Jul 2019, Joe Perches wrote:
> Hello Julia.
>
> I just sent a patch to add yet another string copy mechanism.
>
> This could help avoid misuses of strscpy and strlcpy like this
> patch set:
>
> https://lore.kernel.org/lkml/[email protected]/T/
>
> A possible cocci script to do conversions could be:
>
> $ cat str.cpy.cocci
> @@
> expression e1;
> expression e2;
> @@
>
> - strscpy(e1, e2, sizeof(e1))
> + stracpy(e1, e2)
>
> @@
> expression e1;
> expression e2;
> @@
>
> - strlcpy(e1, e2, sizeof(e1))
> + stracpy(e1, e2)
>
> This obviously does not match the style of all the
> scripts/coccinelle cocci files, but this might be
> something that could be added improved and added.
>
> This script produces:
>
> $ spatch --in-place -sp-file str.cpy.cocci .
> $ git checkout tools/
> $ git diff --shortstat
> 958 files changed, 2179 insertions(+), 2655 deletions(-)
>
> The remainder of strlcpy and strscpy uses in the
> kernel would generally have a form like:
>
> strlcpy(to, from, DEFINE)
>
> where DEFINE is the specified size of to
>
> Could the cocci script above be updated to find
> and correct those styles as well?
I guess it would depend on what "to" is and what DEFINE expands into. For
example, in cpuidle-powernv.c, I see:
strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
and by poking around I see:
struct cpuidle_state {
char name[CPUIDLE_NAME_LEN];
char desc[CPUIDLE_DESC_LEN];
...
};
So I guess this case is ok.
I will look into it.
thanks,
julia
On Tue, 2019-07-23 at 15:52 -0500, Julia Lawall wrote:
> On Mon, 22 Jul 2019, Joe Perches wrote:
> > I just sent a patch to add yet another string copy mechanism.
> >
> > This could help avoid misuses of strscpy and strlcpy like this
> > patch set:
> >
> > https://lore.kernel.org/lkml/[email protected]/T/
> >
> > A possible cocci script to do conversions could be:
> >
> > $ cat str.cpy.cocci
> > @@
> > expression e1;
> > expression e2;
> > @@
> >
> > - strscpy(e1, e2, sizeof(e1))
> > + stracpy(e1, e2)
> >
> > @@
> > expression e1;
> > expression e2;
> > @@
> >
> > - strlcpy(e1, e2, sizeof(e1))
> > + stracpy(e1, e2)
> >
> > This obviously does not match the style of all the
> > scripts/coccinelle cocci files, but this might be
> > something that could be added improved and added.
> >
> > This script produces:
> >
> > $ spatch --in-place -sp-file str.cpy.cocci .
> > $ git checkout tools/
> > $ git diff --shortstat
> > 958 files changed, 2179 insertions(+), 2655 deletions(-)
> >
> > The remainder of strlcpy and strscpy uses in the
> > kernel would generally have a form like:
> >
> > strlcpy(to, from, DEFINE)
> >
> > where DEFINE is the specified size of to
> >
> > Could the cocci script above be updated to find
> > and correct those styles as well?
>
> I guess it would depend on what "to" is and what DEFINE expands into. For
> example, in cpuidle-powernv.c, I see:
>
> strlcpy(powernv_states[index].name, name, CPUIDLE_NAME_LEN);
>
> and by poking around I see:
>
> struct cpuidle_state {
> char name[CPUIDLE_NAME_LEN];
> char desc[CPUIDLE_DESC_LEN];
> ...
> };
Yes, ideally this case would not modify the #define for the
length but adapt the strlcpy(,,DEFINE)
There are a lot of these in drivers/hwmon using I2C_NAME_SIZE.
> I will look into it.
Thanks.
A seantic patch and the resulting output for the case where the third
arugument is a constant is attached. Likewise the resulting output on a
recent linux-next.
julia
On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> A seantic patch and the resulting output for the case where the third
> arugument is a constant is attached. Likewise the resulting output on a
> recent linux-next.
>
> julia
Nice. Thanks Julia
A couple issues:
There is a problem with conversions with assignments
of strlcpy() so ideally the cocci script should make sure
any return value was not used before conversion.
This is not a provably good conversion:
drivers/s390/char/sclp_ftp.c
@@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
sccb->evbuf.mdd.ftp.length = ftp->len;
sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
- len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
- HMCDRV_FTP_FIDENT_MAX);
+ len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
And:
I would have expected the bit below to find and convert uses like
drivers/hwmon/adc128d818.c: strlcpy(info->type, "adc128d818", I2C_NAME_SIZE);
but it seems none of those were converted.
I don't know why.
//------------------------------------------
@r1@
struct i1 *e1;
expression e2;
identifier f,i1,i2;
position p;
@@
\(strscpy\|strlcpy\)(e1->f, e2, i2)@p
@@
identifier r1.i1,r1.i2;
type T;
@@
struct i1 { ... T i1[i2]; ... }
@@
identifier f,i2;
expression e1,e2;
position r1.p;
@@
(
-strscpy
+stracpy
|
-strlcpy
+stracpy
)(e1->f, e2
- , i2
)@p
//------------------------------------------
to find
On Tue, 23 Jul 2019, Joe Perches wrote:
> On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> > A seantic patch and the resulting output for the case where the third
> > arugument is a constant is attached. Likewise the resulting output on a
> > recent linux-next.
> >
> > julia
>
> Nice. Thanks Julia
>
> A couple issues:
>
> There is a problem with conversions with assignments
> of strlcpy() so ideally the cocci script should make sure
> any return value was not used before conversion.
>
> This is not a provably good conversion:
>
> drivers/s390/char/sclp_ftp.c
> @@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
> sccb->evbuf.mdd.ftp.length = ftp->len;
> sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
>
> - len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
> - HMCDRV_FTP_FIDENT_MAX);
> + len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
Sorry, I don't understand the issue here. What specifically should I be
looking for?
>
> And:
>
> I would have expected the bit below to find and convert uses like
> drivers/hwmon/adc128d818.c: strlcpy(info->type, "adc128d818", I2C_NAME_SIZE);
> but it seems none of those were converted.
OK, thanks. I will check on it.
julia
> I don't know why.
>
> //------------------------------------------
> @r1@
> struct i1 *e1;
> expression e2;
> identifier f,i1,i2;
> position p;
> @@
> \(strscpy\|strlcpy\)(e1->f, e2, i2)@p
>
> @@
> identifier r1.i1,r1.i2;
> type T;
> @@
> struct i1 { ... T i1[i2]; ... }
>
> @@
> identifier f,i2;
> expression e1,e2;
> position r1.p;
> @@
> (
> -strscpy
> +stracpy
> |
> -strlcpy
> +stracpy
> )(e1->f, e2
> - , i2
> )@p
> //------------------------------------------
>
> to find
>
>
On Tue, 2019-07-23 at 23:27 -0500, Julia Lawall wrote:
>
> On Tue, 23 Jul 2019, Joe Perches wrote:
>
> > On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> > > A seantic patch and the resulting output for the case where the third
> > > arugument is a constant is attached. Likewise the resulting output on a
> > > recent linux-next.
> > >
> > > julia
> >
> > Nice. Thanks Julia
> >
> > A couple issues:
> >
> > There is a problem with conversions with assignments
> > of strlcpy() so ideally the cocci script should make sure
> > any return value was not used before conversion.
> >
> > This is not a provably good conversion:
> >
> > drivers/s390/char/sclp_ftp.c
> > @@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
> > sccb->evbuf.mdd.ftp.length = ftp->len;
> > sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
> >
> > - len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
> > - HMCDRV_FTP_FIDENT_MAX);
> > + len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
>
> Sorry, I don't understand the issue here. What specifically should I be
> looking for?
The return value of strlcpy differs from (strscpy or stracpy).
strlcpy returns the length of the src string
str[sa]cpy returns the length of the src string if it fits in dest
or -E2BIG if src is truncated by the size of dest
or -E2BIG if dest is 0 length
Any use of the strlcpy return value should not be converted
because the logic after an assignment or use of the return value
can not be assured to have the same behavior.
> > And:
> >
> > I would have expected the bit below to find and convert uses like
> > drivers/hwmon/adc128d818.c: strlcpy(info->type, "adc128d818", I2C_NAME_SIZE);
> > but it seems none of those were converted.
>
> OK, thanks. I will check on it.
Thanks again.
> julia
>
> > I don't know why.
> >
> > //------------------------------------------
> > @r1@
> > struct i1 *e1;
> > expression e2;
> > identifier f,i1,i2;
> > position p;
> > @@
> > \(strscpy\|strlcpy\)(e1->f, e2, i2)@p
> >
> > @@
> > identifier r1.i1,r1.i2;
> > type T;
> > @@
> > struct i1 { ... T i1[i2]; ... }
> >
> > @@
> > identifier f,i2;
> > expression e1,e2;
> > position r1.p;
> > @@
> > (
> > -strscpy
> > +stracpy
> > -strlcpy
> > +stracpy
> > )(e1->f, e2
> > - , i2
> > )@p
> > //------------------------------------------
> >
> > to find
> >
> >
From: Joe Perches
> Sent: 24 July 2019 05:38
> On Tue, 2019-07-23 at 23:27 -0500, Julia Lawall wrote:
> >
> > On Tue, 23 Jul 2019, Joe Perches wrote:
> >
> > > On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> > > > A seantic patch and the resulting output for the case where the third
> > > > arugument is a constant is attached. Likewise the resulting output on a
> > > > recent linux-next.
> > > >
> > > > julia
> > >
> > > Nice. Thanks Julia
> > >
> > > A couple issues:
> > >
> > > There is a problem with conversions with assignments
> > > of strlcpy() so ideally the cocci script should make sure
> > > any return value was not used before conversion.
> > >
> > > This is not a provably good conversion:
> > >
> > > drivers/s390/char/sclp_ftp.c
> > > @@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
> > > sccb->evbuf.mdd.ftp.length = ftp->len;
> > > sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
> > >
> > > - len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
> > > - HMCDRV_FTP_FIDENT_MAX);
> > > + len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
> >
> > Sorry, I don't understand the issue here. What specifically should I be
> > looking for?
>
> The return value of strlcpy differs from (strscpy or stracpy).
>
> strlcpy returns the length of the src string
> str[sa]cpy returns the length of the src string if it fits in dest
> or -E2BIG if src is truncated by the size of dest
> or -E2BIG if dest is 0 length
>
> Any use of the strlcpy return value should not be converted
> because the logic after an assignment or use of the return value
> can not be assured to have the same behavior.
Most of the code probably expects the length to be that copied
(so is broken if the string is truncated).
OTOH the code almost certainly doesn't expect -E2BIG either
so will go wrong in that case as well.
The return value from str[as]cpy() is its own set of bugs
just waiting to happen.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
On Wed, 2019-07-24 at 10:28 +0000, David Laight wrote:
> From: Joe Perches
> > Sent: 24 July 2019 05:38
> > On Tue, 2019-07-23 at 23:27 -0500, Julia Lawall wrote:
> > > On Tue, 23 Jul 2019, Joe Perches wrote:
> > > > On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> > > > > A seantic patch and the resulting output for the case where the third
> > > > > arugument is a constant is attached. Likewise the resulting output on a
> > > > > recent linux-next.
[]
> > > > There is a problem with conversions with assignments
> > > > of strlcpy() so ideally the cocci script should make sure
> > > > any return value was not used before conversion.
> > > >
> > > > This is not a provably good conversion:
> > > >
> > > > drivers/s390/char/sclp_ftp.c
> > > > @@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
> > > > sccb->evbuf.mdd.ftp.length = ftp->len;
> > > > sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
> > > >
> > > > - len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
> > > > - HMCDRV_FTP_FIDENT_MAX);
> > > > + len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
[]
> > Any use of the strlcpy return value should not be converted
> > because the logic after an assignment or use of the return value
> > can not be assured to have the same behavior.
>
> Most of the code probably expects the length to be that copied
> (so is broken if the string is truncated).
Fortunately there are relatively few uses of the return
value of strlcpy so it's not a large problem set.
Slightly unrepresentative counts:
$ git grep -P "^\s+strlcpy\b" | wc -l
1681
$ git grep -P "=\s*strlcpy\b" | wc -l
28
On Wed, 24 Jul 2019, Joe Perches wrote:
> On Wed, 2019-07-24 at 10:28 +0000, David Laight wrote:
> > From: Joe Perches
> > > Sent: 24 July 2019 05:38
> > > On Tue, 2019-07-23 at 23:27 -0500, Julia Lawall wrote:
> > > > On Tue, 23 Jul 2019, Joe Perches wrote:
> > > > > On Tue, 2019-07-23 at 22:54 -0500, Julia Lawall wrote:
> > > > > > A seantic patch and the resulting output for the case where the third
> > > > > > arugument is a constant is attached. Likewise the resulting output on a
> > > > > > recent linux-next.
> []
> > > > > There is a problem with conversions with assignments
> > > > > of strlcpy() so ideally the cocci script should make sure
> > > > > any return value was not used before conversion.
> > > > >
> > > > > This is not a provably good conversion:
> > > > >
> > > > > drivers/s390/char/sclp_ftp.c
> > > > > @@ -114,8 +114,7 @@ static int sclp_ftp_et7(const struct hmc
> > > > > sccb->evbuf.mdd.ftp.length = ftp->len;
> > > > > sccb->evbuf.mdd.ftp.bufaddr = virt_to_phys(ftp->buf);
> > > > >
> > > > > - len = strlcpy(sccb->evbuf.mdd.ftp.fident, ftp->fname,
> > > > > - HMCDRV_FTP_FIDENT_MAX);
> > > > > + len = stracpy(sccb->evbuf.mdd.ftp.fident, ftp->fname);
> []
> > > Any use of the strlcpy return value should not be converted
> > > because the logic after an assignment or use of the return value
> > > can not be assured to have the same behavior.
> >
> > Most of the code probably expects the length to be that copied
> > (so is broken if the string is truncated).
>
> Fortunately there are relatively few uses of the return
> value of strlcpy so it's not a large problem set.
>
> Slightly unrepresentative counts:
>
> $ git grep -P "^\s+strlcpy\b" | wc -l
> 1681
> $ git grep -P "=\s*strlcpy\b" | wc -l
> 28
OK, it's easy to check for in any case. Thanks.
julia
New version. I check for non-use of the return value of strlcpy and
address some issues that affected the matching of the case where the first
argument involves a pointer dereference. Actually, an isomorphism now
takes care of that case, so it doesn't show up in the semantic patch
explicitly any more.
julia
> @r@
> identifier f,i1,i2;
> struct i1 e1;
> expression e2;
> position p;
> @@
> \(strscpy\|strlcpy\)(e1.f, e2, i2)@p
I have got the impression that the replacement can work also
without an inherited position variable at the end.
How do you think about to omit this SmPL rule then?
Can it be nicer to reduce duplicate SmPL code a bit?
Regards,
Markus
On Thu, 25 Jul 2019, Markus Elfring wrote:
> > @r@
> > identifier f,i1,i2;
> > struct i1 e1;
> > expression e2;
> > position p;
> > @@
> > \(strscpy\|strlcpy\)(e1.f, e2, i2)@p
>
> I have got the impression that the replacement can work also
> without an inherited position variable at the end.
> How do you think about to omit this SmPL rule then?
>
> Can it be nicer to reduce duplicate SmPL code a bit?
Huh? Rule 2 is important, to ensure that ths size is correct. Without
rule 1, how can rule 2 be checked?
julia
On Wed, 2019-07-24 at 20:42 -0500, Julia Lawall wrote:
> New version. I check for non-use of the return value of strlcpy and
> address some issues that affected the matching of the case where the first
> argument involves a pointer dereference. Actually, an isomorphism now
> takes care of that case, so it doesn't show up in the semantic patch
> explicitly any more.
>
> julia
Nice x 2, thanks again.
More comments:
@@
identifier f,i2,i1;
struct i1 e1;
expression e2;
local idexpression x;
position r.p;
@@
(
-x = strlcpy
+stracpy
(e1.f, e2
- , i2
)@p;
... when != x
Just for completeness and correctness, as I at
least don't find an existing use:
Perhaps this "x =" should also include += and +
and the various other operators that are possible
or does SmPL grammar already do that?
Also, it might be nice to include the trivial
conversion with sizeof(e1) and ARRAY_SIZE(e1)
so a single script could be run over the kernel.
I'll see about adding that and try it myself
so an automated conversion should be possible.
On Thu, 25 Jul 2019, Joe Perches wrote:
> On Wed, 2019-07-24 at 20:42 -0500, Julia Lawall wrote:
> > New version. I check for non-use of the return value of strlcpy and
> > address some issues that affected the matching of the case where the first
> > argument involves a pointer dereference. Actually, an isomorphism now
> > takes care of that case, so it doesn't show up in the semantic patch
> > explicitly any more.
> >
> > julia
>
> Nice x 2, thanks again.
Not quite nice due to the ignoring of rule 2 noticed by Markus. There is
actually currently no guarantee that the size is right. I'm testing a new
version.
>
> More comments:
>
> @@
> identifier f,i2,i1;
> struct i1 e1;
> expression e2;
> local idexpression x;
> position r.p;
> @@
> (
> -x = strlcpy
> +stracpy
> (e1.f, e2
> - , i2
> )@p;
> ... when != x
>
> Just for completeness and correctness, as I at
> least don't find an existing use:
>
> Perhaps this "x =" should also include += and +
> and the various other operators that are possible
> or does SmPL grammar already do that?
I could do this. One might though think that if someone went to the
trouble of computing +=, these would be cases that we don't want to
change? Still, it's not problem to add all assignment operators.
> Also, it might be nice to include the trivial
> conversion with sizeof(e1) and ARRAY_SIZE(e1)
> so a single script could be run over the kernel.
Sure, I'll do that when all this is working. I didn't want those results
to drown out these ones.
thanks,
julia
>
> I'll see about adding that and try it myself
> so an automated conversion should be possible.
>
>
>
> New version. I check for non-use of the return value of strlcpy and
> address some issues that affected the matching of the case where the first
> argument involves a pointer dereference.
I suggest to take another look at corresponding implementation details
of the shown SmPL script.
> \(strscpy\|strlcpy\)(e1.f, e2, i2)@p
Can the data access operator “->” (arrow) matter also here?
> @@
> identifier r.i1,r.i2;
> type T;
> @@
> struct i1 { ... T i1[i2]; ... }
Will an additional SmPL rule name be helpful for this part?
> @@
> (
> -x = strlcpy
> +stracpy
> (e1.f, e2
> - , i2
> )@p;
> ... when != x
>
> |
I wonder about the deletion of the assignment target.
Should the setting of such a variable be usually preserved?
Regards,
Markus
On Thu, 2019-07-25 at 08:58 -0500, Julia Lawall wrote:
> On Thu, 25 Jul 2019, Joe Perches wrote:
[]
> > Just for completeness and correctness, as I at
> > least don't find an existing use:
> >
> > Perhaps this "x =" should also include += and +
> > and the various other operators that are possible
> > or does SmPL grammar already do that?
>
> I could do this. One might though think that if someone went to the
> trouble of computing +=, these would be cases that we don't want to
> change?
Maybe I quoted the wrong bit. But exactly.
Anywhere the return value of strlcpy is used, not just as
an assignment, is an instance that should not be changed.
Thanks for doing this.
> Huh? Rule 2 is important, to ensure that ths size is correct.
I assume that the dependency of the replacement on the data structure check
can become clearer.
Regards,
Markus
> > > Perhaps this "x =" should also include += and +
> > > and the various other operators that are possible
> > > or does SmPL grammar already do that?
This is now done. It seems to have had no impact.
> Anywhere the return value of strlcpy is used, not just as
> an assignment, is an instance that should not be changed.
Mostly what is changed for strlcpy is the case where there is a ; after
the call, so that is not going to match an if test, etc. It also doesn't
match the right side of an assignment. The only case of an assignment
that is matched is when the variable is not used afterwards.
The rule now properly checks that the third argument is the size of the
first argument. This made a small reduction in the number of results.
julia
> The rule now properly checks that the third argument is the size of the
> first argument. This made a small reduction in the number of results.
Thanks for your SmPL script adjustments.
Will deviations from this restriction become more interesting?
> \(strscpy\|strlcpy\)(e1.f, e2, i2)@p
Would you like to take the function “strscpy_pad” also into account for
source code transformations with the macro “stracpy_pad”?
Regards,
Markus
I see that stracpy is now in linux-next. Would it be reasonable to send
patches adding uses now? Are there any rules for adding calls to
stracpy_pad?
julia
On Mon, 2019-07-29 at 10:07 -0400, Julia Lawall wrote:
> I see that stracpy is now in linux-next. Would it be reasonable to send
> patches adding uses now?
My preference would be to have:
o A provably correct script If a small subset of
possible conversions are skipped, that's fine.
o As piecemeal patches cause a lot of churn, work
for individual maintainers, and also are not
universally applied, have that script run
kernel-wide after an rc1 and applied all-at-once.