2003-02-27 06:40:08

by Dan Kegel

[permalink] [raw]
Subject: [PATCH] kernel source spellchecker

Since the main remaining feature before release of the 2.6
kernel is fixing all the remaining spelling errors,
this patch seems appropriate. This is against 2.4 but
should apply to other versions as well.
It's not very smart, but should help get us to our
all-important goal of 100% correctly spellt kernel source.
Todo: make it ignore names from the MAINTAINERS file,
the list of signals and syscalls, and other well-known
english words seem mostly in Webster's Posix edition;
rewrite in Perl rather than C, or add real Makefile entry.
Enjoy!
- Dan

--- /dev/null 2002-08-30 16:31:37.000000000 -0700
+++ linux/scripts/spellcheck-kernel 2003-02-26 22:51:46.000000000 -0800
@@ -0,0 +1,12 @@
+#!/bin/sh
+# Script to spellcheck kernel.
+# usage: spellcheck-kernel [ sourcedir ]
+# The source directory defaults to /usr/src/linux.
+# e.g.
+# scripts/spellcheck-kernel .
+# Check spelling of the kernel tree in the current directory
+
+sourcedir=${1-/usr/src/linux}
+
+make -C .. scripts/lspell
+find $sourcedir -name '*.[ch]' | xargs ./lspell
--- /dev/null 2002-08-30 16:31:37.000000000 -0700
+++ linux/scripts/lspell.c 2003-02-26 22:51:14.000000000 -0800
@@ -0,0 +1,74 @@
+/*
+ * C comment spell checker
+ * For each given source file, print the filename, then
+ * extract all comments from the file, send them through the system
+ * spellchecker, sort the list of words flagged as misspellings,
+ * and word-wrap the sorted list.
+ * Copyright 2003, Dan Kegel. Licensed under GPL. See the file ../COPYING for details.
+ */
+#include <stdio.h>
+int
+main(int argc, char **argv)
+{
+ int argi;
+
+ for (argi = 1; argi < argc; argi++) {
+ int c;
+ enum state_t { NONCOMMENT, SLASH, COMMENT, STAR, EOLCOMMENT };
+ enum state_t state = NONCOMMENT;
+ FILE *fp = fopen(argv[argi], "rt");
+ if (!fp) {
+ perror(argv[argi]);
+ continue;
+ }
+ FILE *pout = popen("/usr/bin/spell | sort -f | fmt", "w");
+ if (!pout) {
+ perror("/usr/bin/spell | sort -f | fmt");
+ exit(1);
+ }
+ printf("\n%s:\n", argv[argi]);
+ fflush(stdout);
+ while ((c = getc(fp)) != EOF) {
+ switch (state) {
+ case NONCOMMENT:
+ if (c == '/')
+ state = SLASH;
+ break;
+ case SLASH:
+ if (c == '*')
+ state = COMMENT;
+ else if (c == '/')
+ state = EOLCOMMENT;
+ else {
+ state = NONCOMMENT;
+ }
+ break;
+ case COMMENT:
+ if (c == '*')
+ state = STAR;
+ else
+ fputc(c, pout);
+ break;
+ case STAR:
+ if (c == '/')
+ state = NONCOMMENT;
+ else {
+ if (c != '*') {
+ fputc('\n', pout);
+ state = COMMENT;
+ }
+ }
+ break;
+ case EOLCOMMENT:
+ if (c == '\n')
+ state = NONCOMMENT;
+ else
+ fputc(c, pout);
+ break;
+ }
+ }
+ fclose(pout);
+ fclose(fp);
+ }
+ exit(0);
+}

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045


2003-03-01 05:18:14

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

accomodate
Acknowledgement
acknowledgements
adaptor
adaptors
adddress
additionnal
alignement
allways
analyse
angerous
apropriate
arround
assosciated
assosiated
asyncronous
Auxillary
availible
avaliable
basicly
beeing
borken
boundry
bramaged
cacheable
callin
cancelled
capabilites
childs
choosen
comamnd
comming
commited
comparision
Compatability
compatibilty
compatiblity
completly
concurent
Continous
continous
controler
controllen
coresponding
decrementer
decriptor
defered
defintions
denormalised
denormalized
dependend
desciptor
devide
differenciate
doesnt
DONT
dont't
dugger
emptive
entrancy
entrys
everytime
explicitely
foward
fuction
funtion
guarenteed
handeling
harware
hasnt
havn't
houldn't
hysical
i'm
immediatly
implemantation
implmentation
Incomming
incomming
indice
infomation
Inifity
inital
initalization
Initalize
initalize
inited
initilization
initing
inteface
interrrupt
interrups
Interupt
intervall
intialization
Intialize
intialize
invokation
is'nt
Lenght
managment
mergeable
middelin
modelled
Modularisation
modularisation
Modularised
neccessary
negociated
Neighbour
neighbour
Noone
nuclecu
occured
occurence
occuring
organised
ouput
outputing
overriden
paramter
paramters
Passthrough
passthru
performace
popies
preceeding
promiscous
realise
realised
receving
Recieve
recieve
recieved
recognised
reenable
reentrance
registred
Regsiter
relevent
Reorganisation
reorganised
requeue
reselection
resetted
ressources
scather
serialisation
shouldnt
signalled
signalling
sleepie
specifc
specifed
speficied
sublicense
succesful
successfull
superflous
Synchronisation
synchronisation
there're
threshhold
throught
thru
timming
TORTIOUS
tranceiver
transfering
transmiting
trasfered
truely
tunables
uffer
uglyness
uncachable
unrecognised
useable
usefull
verticies
waranty
watseful
wierd
writeable
writting


Attachments:
errors.txt (1.77 kB)

2003-03-01 14:01:05

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Fri, 28 Feb 2003, Dan Kegel wrote:

> Joe Perches wrote:
> > On Wed, 2003-02-26 at 22:59, Dan Kegel wrote:
> >
> >>Since the main remaining feature before release of the 2.6
> >>kernel is fixing all the remaining spelling errors,
> >>this patch seems appropriate.
> >
> >
> > Who let the comedian in? :o
>
> At first I was jokeing, but what the heck, I figured I'd run
> it. Here are the mispelled words that occur in five
> or more files and that lookd like real misspellings to my eye.
> The list contains some words that are ok in British usage;
> I don't have a British spellchecker (that I know how to use).
>
> Perhaps some eagr Perl monger can (after removing the British-ok
> words!) contribute a spellcorrect-kernel program that takes
> in a liste of known misspellings + corrections, and applies
> them to the commments in all kernel source files...
> - Dan

I've no spelling knowledge, so the list of spellcorrections must be made
by someone else. But i can volunteer the perl-snippet to correct the
files. :-)

See attachment.

The programm uses this file-format:
correct=false1,false2,false3...

As there are many ways to false-write a word i think this is the best(tm)
solution to get readable file.
:-)

I've only done a "quick-debug", so there might still be errors in the
program. (Including spelling-bugs ;-) )

- snip -
Usage: spell_fix.pl <options>, where valid options are
--help # this message :-)
--file <file> # File(s) to be checked
--dir <dir> # Directory(s) to be checked (recursive!)
--spell-file # File with the correction-list
--debug # Debugging-Messages
- snip -




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


Attachments:
spell-fix.pl (3.61 kB)

2003-03-01 15:42:27

by Shaheed

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker


Matthias,

Here is a list of corrections...I have omitted those that seem OK to me,
apostrophes, proper names, some that seem to be hypenation-related, American
vs. British differences and a few others.

In the case of broken American spelling, I have provided American fixes
(against my better judgement :-)). Enjoy...

accommodate=accomodate
adapter=adaptor
address=adddress
additional=additionnal
alignment=alignement
always=allways
appropriate=apropriate
around=arround
associated=assosciated,assosiated
asynchronous=asyncronous
Auxillary=Auxillary
available=availible,avaliable
basically=basicly
being=beeing
broken=borken
boundary=boundry
brain-damaged=dain-bramaged,dain bramaged
calling=callin
capabilities=capabilites
chosen=choosen
command=comamnd
coming=comming
committed=commited
comparison=comparision
Compatibility=Compatability
compatibility=compatibilty,compatiblity
completely=completly
concurrent=concurent
Continuous=Continous
continuous=continous
controller=controler,controllen
corresponding=coresponding
decrementor=decrementer
descriptor=decriptor,desciptor
deferred=defered
definitions=defintions
dependent=dependend
divide=devide
differentiate=differenciate
entries=entrys
everytime=everytime
explicitly=explicitely
forward=foward
function=fuction,funtion
guaranteed=guarenteed
handling=handeling
hardware=harware
physical=hysical
immediately=immediatly,
implementation=implemantation,implmentation
Incoming=Incomming
incoming=incomming
index=indice
information=infomation
Infinity=Inifity
initial=inital
initialization=initalization,initilization,intialization
Initialize=Initalize,Intialize
initialize=initalize,intialize
interface=inteface
Interrupt=Interupt
interrupt=interrrupt
interrupts=interrups
interval=intervall
invocation=invokation
Length=Lenght
management=managment
necessary=neccessary
negotiated=negociated
No-one=Noone
occurred=occured
occurrance=occurence
occurring=occuring
output=ouput
outputting=outputing
overridden=overriden
parameter=paramter
parameters=paramters
performance=performace
promiscuous=promiscous
receiving=receving
Receive=Recieve
receive=recieve
received=recieved
registered=registred
Register=Regsiter
relevant=relevent
resources=ressources
scatter=scather
specific=specifc
specified=specifed,speficied
successful=succesful,successfull
superfluous=superflous
threshold=threshhold
through=throught
timing=timming
transceiver=tranceiver
transferring=transfering
transmitting=transmiting
transferred=trasfered
truly=truely
ugliness=uglyness
usable=useable
useful=usefull
vertices=verticies
warranty=waranty
wasteful=watseful
writing=writting

2003-03-01 16:25:38

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 1 March 2003 15:57:13 +0000, shaheed wrote:
>
> Here is a list of corrections...I have omitted those that seem OK to me,
> apostrophes, proper names, some that seem to be hypenation-related, American
> vs. British differences and a few others.
>
> In the case of broken American spelling, I have provided American fixes
> (against my better judgement :-)). Enjoy...
>
> brain-damaged=dain-bramaged,dain bramaged

What's wrong with those? Yes, they don't exist in a dictionary, but
neither do a
c
o
rner or d-a-s-h-i-n-g or the famous
p z
y r a and i g g
m i d s u r a t s

I don't know where art starts and whether kernel documentation is a
decent place for it, but a second thought might be worth it. :)

btw: man fortune | grep redundancy -C2

J?rn, bringing the comedian back

--
Fantasy is more important than knowlegde. Knowlegde is limited,
while fantasy embraces the whole world.
-- Albert Einstein

2003-03-01 17:03:04

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Hi


On Sat, 1 Mar 2003, Matthias Schniedermeyer wrote:

> I've no spelling knowledge, so the list of spellcorrections must be made
> by someone else. But i can volunteer the perl-snippet to correct the
> files. :-)

Take 1.01

A minor revision of the perl-programm. Now it ignores empty lines,
removes comments and leading/trailing whitespaces.
In debug-mode it now prints what words where missspelled (3x"s"?)
The spell-file (default "spell_fix.txt") is now searched also in the
directory from where spell_fix.pl was started.

And the list contributed by shaheed
(minus "everytime=everytime" and "Auxillary=Auxillary")

With this list a run over 2.5.63 found 2866 spelling-errors. (counting the
+ lines in the diff.) The diff is 18690 lines long (846689 bytes)





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


Attachments:
spell-fix.pl (4.27 kB)
spell-fix.txt (2.29 kB)
Download all attachments

2003-03-01 17:46:15

by Shaheed

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Saturday 01 Mar 2003 4:35 pm, J?rn Engel wrote:
> On Sat, 1 March 2003 15:57:13 +0000, shaheed wrote:
...
> >
> > brain-damaged=dain-bramaged,dain bramaged
>
> What's wrong with those? Yes, they don't exist in a dictionary, but
> neither do a
...
> I don't know where art starts and whether kernel documentation is a
> decent place for it, but a second thought might be worth it. :)

Hi J?rn,

Like you, I do get the joke: but in this case as in others, I took the view
that the needs of those whose first language was not English should prevail.
(Yes, I also know about the convention that the kernel is defined to be in
English.)

And, being not yet of even janitor status (:-)) myself, I did not submit the
patch: I guess whoever does that has the final call!

Cheers, Shaheed

2003-03-01 18:21:43

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 1 March 2003 18:01:19 +0000, shaheed wrote:
> On Saturday 01 Mar 2003 4:35 pm, J?rn Engel wrote:
> > On Sat, 1 March 2003 15:57:13 +0000, shaheed wrote:
> > >
> > > brain-damaged=dain-bramaged,dain bramaged
> >
> > What's wrong with those? [...]
>
> Like you, I do get the joke: but in this case as in others, I took the view
> that the needs of those whose first language was not English should prevail.
> (Yes, I also know about the convention that the kernel is defined to be in
> English.)
>
> And, being not yet of even janitor status (:-)) myself, I did not submit the
> patch: I guess whoever does that has the final call!

ack.

I just wanted to make sure it would not be "corrected" by mistake. If
those are supposed to go, so be it. Things will be good now. :-)

J?rn

--
The competent programmer is fully aware of the strictly limited size of
his own skull; therefore he approaches the programming task in full
humility, and among other things he avoids clever tricks like the plague.
-- Edsger W. Dijkstra

2003-03-01 18:29:26

by Pascal Schmidt

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 01 Mar 2003 17:00:17 +0100, you wrote:

> brain-damaged=dain-bramaged,dain bramaged

That's not a spelling error, it's a joke.

--
Ciao,
Pascal

2003-03-01 18:33:50

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Matthias Schniedermeyer wrote:
> I've no spelling knowledge, so the list of spellcorrections must be made
> by someone else. But i can volunteer the perl-snippet to correct the
> files. :-)

Smashing! However, it should probably avoid correcting spellings
in anything but C comments.
Perhaps my C comment parser should be converted to perl and
incorporated into spell-fix.pl, and used to divide the source
file into two streams (comment and noncomment); the comment
stream would be spell-fixed and merged back with the noncomment
stream to create the output.

And thanks to Shaheed for converting my list of misspellings to a correction list!

I suggest we remove the entries
broken=borken
brain-damaged=dain-bramaged,dain bramaged
as we're not trying to remove humor from the comments.

Also, the words 'controllen' and 'callin', are not typos, so
calling=callin
should be removed, and
controller=controler,controllen
should be just
controller=controler

The above examples make me think the list of corrections will have
to be very carefully vetted before we turn this thing loose.
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-01 19:09:15

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 2003-03-01 at 11:54, Dan Kegel wrote:

>
> I suggest we remove the entries
> broken=borken
> brain-damaged=dain-bramaged,dain bramaged
> as we're not trying to remove humor from the comments.
>
> Also, the words 'controllen' and 'callin', are not typos, so
> calling=callin
> should be removed, and
> controller=controler,controllen
> should be just
> controller=controler
>
> The above examples make me think the list of corrections will have
> to be very carefully vetted before we turn this thing loose.
> - Dan

Once you've loosed your beast upon the tree, I'd suggest that you
very carefully look through the resulting diff for inappropriate
corrections and redact the unnecessary hunks. In the spelling fixes
which I sent to Linus, I redacted hunks which didn't need fixing. For
example, Linus making fun of Sun folks' ability to spell, etc. and some
comments in French or German for which the spelling was correct in those
languages.

In addition to making fixes in the comments in the source, all of
Documentation should be fair game.

Then you'll have to contend with the folks whose out-of-tree patches
you've borked.

Steven

2003-03-01 19:20:28

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Hi




> Matthias Schniedermeyer wrote:
> > I've no spelling knowledge, so the list of spellcorrections must be made
> > by someone else. But i can volunteer the perl-snippet to correct the
> > files. :-)
>
> Smashing! However, it should probably avoid correcting spellings
> in anything but C comments.

Here we go.

Take 1.10

This versions defaults to only correct words within a comment.

> Perhaps my C comment parser should be converted to perl and

No need to make any hasled.

// Comments are easy(tm). "Everything after // until line-end".

and /* ... */ are easy(tm) too because gcc doesn't support to nest them.

Only a handfull of lines were needed to handle this. :-)





Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


Attachments:
spell-fix.pl (5.03 kB)

2003-03-01 20:23:36

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

HI


> > Matthias Schniedermeyer wrote:
> > > I've no spelling knowledge, so the list of spellcorrections must be made
> > > by someone else. But i can volunteer the perl-snippet to correct the
> > > files. :-)
> >
> > Smashing! However, it should probably avoid correcting spellings
> > in anything but C comments.
>
> Here we go.
>
> Take 1.10

Ups. A bit to "noisy".

Take 1.10b



Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


Attachments:
spell-fix.pl (5.03 kB)

2003-03-01 20:59:33

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Steven Cole wrote:
> Once you've loosed your beast upon the tree, I'd suggest that you
> very carefully look through the resulting diff for inappropriate
> corrections and redact the unnecessary hunks. In the spelling fixes
> which I sent to Linus, I redacted hunks which didn't need fixing. For
> example, Linus making fun of Sun folks' ability to spell, etc. and some
> comments in French or German for which the spelling was correct in those
> languages.

Good points.

> In addition to making fixes in the comments in the source, all of
> Documentation should be fair game.

Yeah, but that's easy :-)

> Then you'll have to contend with the folks whose out-of-tree patches
> you've borked.

That's a good argument for making the spellfix program polished
enough that everyone can use it, I think. Those maintaining
out-of-tree patches can run the tool on their tree, and regenerate
diffs.

- Dan



--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-01 21:05:16

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Matthias Schniedermeyer wrote:
> This versions defaults to only correct words within a comment. ...
> // Comments are easy(tm). "Everything after // until line-end".
>
> and /* ... */ are easy(tm) too because gcc doesn't support to nest them.

I'll be damned. I'm impressed with how easy that was in perl.
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-01 21:15:22

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, Mar 01, 2003 at 01:25:53PM -0800, Dan Kegel wrote:
> Matthias Schniedermeyer wrote:
> >This versions defaults to only correct words within a comment. ...
> >// Comments are easy(tm). "Everything after // until line-end".
> >
> >and /* ... */ are easy(tm) too because gcc doesn't support to nest them.
>
> I'll be damned. I'm impressed with how easy that was in perl.

As long as there is no nesting involved most things a easy/trivial to
achieve with REs.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2003-03-02 01:48:13

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

accessible=accesible
accessing=accesing
accommodate=accomodate,acommodate
Acknowledge=Acknowlege
acknowledged=acknoledged
acquire=aquire
across=accross
actions=actons
adapter=adapater,adaptor,adatper
additional=additionnal
Additional=Addtional
address=adddress,addresss
Address=Adress
Aggressive=Agressive
aggressively=agressively
aligned=alligned
alignment=alignement
already=allready
Always=Allways
always=allways
amount=ammount
appropriate=appropiate,approriate,apropriate
arbitrarily=arbitarily,aribtrarily
Arbitrary=Arbitary
arbitrary=aribtrary
around=arround
assembler=asssembler
associated=assosciated,assosiated
assume=asume
asynchronous=asyncronous
at least=atleast
atomically=atomicly
Auxiliary=Auxilary
available=availble,availible,avaliable
Basically=Basicly
basically=basicly
because=becuase
beginning=beggining
being=beeing
boundaries=boundries
boundary=boundry
cancellation=cancelation
capabilities=capabilites
caught=catched
changeable=changable
character=charater
Characters=Characteres
chose=choosed
chosen=choosen
circumstances=cirumstances
coming=comming
command=comamnd
commence=commense
committed=commited
communication=commuication
comparison=comparision
compatibility=compability
Compatibility=Compatability
compatibility=compatibilty,compatiblity
completely=completly
concurrent=concurent
configuration=configration
consecutive=consequtive
constants=konstants
consumer=comsumer
contiguous=contigious,contingous
Continuous=Continous
continuous=continous
control=controll
controller=contoller,controler
controlling=controling
Converted=Coverted
corresponding=coresponding
courtesy=curteousy
deactivate=deactive
Debugging=Debuging
debugging=debuging
decrementor=decrementer
deferred=defered
definitions=defintions
dependent=dependend
deprecated=depricated
descendant=descendent
descriptor=decriptor,desciptor
developed=developped
didn't=didnt
differentiate=differenciate
discipline=discpline
discontiguous=discontigous
distinguish=distingush
divide=devide
divisor=divizor
Do not=Donot
doesn't=doens't
DOESN'T=DOESNT
doesn't=doesnt
DON'T=DONT
don't=dont't
dynamically=dynamicly
efficient=efficent
empirical=imperical
enhancements=enhandcements
enough=enought
entries=entrys
environment=enviroment
equipped=equiped
error=errror
Evaluate=Evalute
every time=everytime
excess=execess
execution=exection
existence=existance
explicitly=explicitely,explicity
extended=extented
extension=extention
firmware=firware
forward=foward
function=fucntion,fuction,funcion,funciton,functin,funtion
further=furthur
guaranteed=guarenteed
handling=handeling
hardware=harware
hasn't=hasnt
haven't=havn't
I'm=i'm
identical=indentical
immediately=immediatly
implementation=implemantation,implemenation,implmentation
Incoming=Incomming
incoming=incomming
index=indice
indices=indeces
Infinity=Inifity
information=infomation,informatation
initial=inital
initialization=initalization,initilization,intialisation,intialization
Initialize=Initalize
initialize=initalize
Initialize=Initialyze,Initilialyze,Initilize
initialize=initilize,intiailize
Initialize=Intialize
initialize=intialize
instance=isntance
instruction=intruction
interface=inteface
interrupt=interrrupt
Interrupt=Interupt
interrupt=intrrupt
interrupts=interrups
interval=intervall
invariant=invarient
invocation=invokation
isn't=is'nt
issuing=issueing
labeled=labelled
Length=Lenght
License=Licens
Licensed=Licenced
loosely=losely
management=managment,manangement
miscellaneous=miscellaneaous
modeled=modelled
mystery=mistery
necessary=neccessary,necessery
negative=negativ
negotiated=negociated
negotiation=negociation,neogtiation
No-one=Noone
nonexistent=nonexistant
noticeable=noticable
occurrance=occurence
occurred=occured
occurrences=occurances
occurring=occuring
original=orignal
Originally=Originaly
output=ouput
outputting=outputing
overridden=overidden,overriden
parameter=paramter
parameters=paramaters,paramters
particular=paticular
particularly=particularily
Pending=Pendings
Performance=Perfomance
performance=performace,preformance
Peripheral=Periferial
permissible=permissable
physical=hysical,phyiscal
potentially=potentally
preceded=preceeded
preceding=preceeding
presence=presense
privilege=priviledge
promiscuous=promiscous
Propagate=Propogate
prototypes=protoypes
Pseudo=Psuedo
publicly=publically
queuing=queing
really=realy
reasonable=reasonnable,resonable
receive=recevie
Receive=Recieve
receive=recieve
received=recieved
receiving=receving
referred=refered
regardless=regarless
Register=Regsiter,Reigster
registered=registred
registration=registaration
related=releated
relevant=relevent
remaining=remaing
remember=remeber
removable=removeable
renewed=renewd
requests's=requeusts
requests=requeuing
requeue=requests's
requeuing=requeue
reselection=relection
reset=resetted
resources=ressources
responsibility=responsability
retrieve=retreive
safely=savely
safety=saftey
sample=smaple
scatter=scather
scenario=scenerio
Separate=Seperate
Shouldn't=Shouldnt
shouldn't=shouldnt
signaled=signalled
Signaling=Signalling
signaling=signalling
Similarly=Similarily
specific=specfic,specifc
Specification=Specificiation
specified=specifed,speficied
specify=specifiy
specifying=specifing
straightforward=straighforward
structures=stuctures
succeeded=succeded
success=sucess
successful=succesful,successfull
successfully=sucessfully
sufficient=sufficent
superfluous=superflous
suppress=supress
swapped=swaped
synchronize=synchronyze
synchronizing=syncronizing
synchronous=syncronous
threshold=threshhold
through=throught,throuth
timing=timming
TORTUOUS=TORTIOUS
transaction=transation
transceiver=tranceiver
transferred=trasfered
transferring=transfering
translation=tranlation
transmission=transmition,transmittion
transmitter=transmiter
transmitting=transmiting
triggered=tiggered,triggerred
trigging=triggerg
truly=truely
ugliness=uglyness
underrun=underrrun
undesirable=undesireable
Unfortunately=Unfortunatly
unfortunately=unfortunatly
uninitialized=unitialized
unknown=unkown
usable=useable,usuable
useful=usefull
vertices=verticies
warranty=waranty
wasteful=watseful
weird=wierd
writable=writeable
Writing=Writting
writing=writting


Attachments:
spell-fix-dan1.txt (5.97 kB)

2003-03-02 02:41:18

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

My corrections file is up at http://www.kegel.com/spell-fix-dan1.txt
and the patch that produces is
http://www.kegel.com/linux-2.5.63-bk5-spell.patch.bz2.bin
The perl script took about an hour of 450MHz cpu time.
(Might be worth adding a quick path to detect and skip
files with none of the misspelled words. Or just run
on a fast machine...)

I did a spot check, and it looked pretty good, but some
of the fixes are just too pedantic. In particular,
decrementor=decrementer
should probably be dropped from the fix list.

Any other changes people want to see in the script
or the corrections file? Should I add fixes for
uncommon errors (those that happen only in one or two files)?

- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-02 03:20:40

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 2003-03-01 at 19:08, Dan Kegel wrote:
[snipped]
>
> This corrections file is probably good enough to actually use.
> I'm running it against linux-2.5.63-bk5 now...
> - Dan
[snippage]
> Pseudo=Psuedo

Hmm, psuedo didn't get caught. Is psuedo code particularly smooth?

Anyway, looking beyond comments only may catch real bugs.
I found this a few days ago whilst looking for commonly
misspelled words, a much weaker technique than your spellchecker.

[steven@spc5 linus-2.5]$ bk export -tplain ../linux
[steven@spc5 linus-2.5]$ cd ../linux
[steven@spc5 linux]$ find . -type f | xargs grep psuedo
./arch/ppc64/kernel/iSeries_IoMmTable.h:/* allocated the psuedo I/O Address. */
./drivers/base/platform.c: * platform.c - platform 'psuedo' bus for legacy devices
./drivers/scsi/aic7xxx/aic7xxx.seq: * use byte 27 of the SCB as a psuedo-next pointer and to thread a list
./drivers/scsi/g_NCR5380.c: * Perform a psuedo DMA mode read from an NCR53C400 or equivalent
./drivers/scsi/g_NCR5380.c: * Perform a psuedo DMA mode read from an NCR53C400 or equivalent
./drivers/video/skeletonfb.c: * no color palettes are supported. Here a psuedo palette is created
./drivers/video/anakinfb.c: fb_info.psuedo_palette = colreg;
----------------------------------------^^^^^^
This shouldn't even compile.

[steven@spc5 linux]$ find . -name "*.h" | xargs grep pseudo_palette
./drivers/video/i810/i810.h: u32 pseudo_palette[17];
./include/linux/fb.h: void *pseudo_palette; /* Fake palette of 16 colors and

Yep, a mistake all right. Adding the listed author to the cc list.

Steven

2003-03-02 03:35:37

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, Mar 01, 2003 at 01:20:09PM -0800, Dan Kegel wrote:
> Steven Cole wrote:
> >Once you've loosed your beast upon the tree, I'd suggest that you
> >very carefully look through the resulting diff for inappropriate
> >corrections and redact the unnecessary hunks. In the spelling fixes
> >which I sent to Linus, I redacted hunks which didn't need fixing. For
> >example, Linus making fun of Sun folks' ability to spell, etc. and some
> >comments in French or German for which the spelling was correct in those
> >languages.
>
> Good points.
>
> >In addition to making fixes in the comments in the source, all of
> >Documentation should be fair game.
>
> Yeah, but that's easy :-)
>
> >Then you'll have to contend with the folks whose out-of-tree patches
> >you've borked.
>
> That's a good argument for making the spellfix program polished
> enough that everyone can use it, I think. Those maintaining
> out-of-tree patches can run the tool on their tree, and regenerate
> diffs.

An ispell filter seems a simpler approach to me. (ispell -F
filter) I use that (shown here to head off requests) for
email so quoted content is ignored. A similar filter for C
source would make this trivial.


$ grep ispell .muttrc
set ispell="ispell -F maildequote"
$ cat bin/maildequote
#!/usr/bin/perl

while (<STDIN>)
{
/^[>|] / || /^On .* wrote:$/ and tr[A-Za-z][_];
print $_;
}
print "\004";


--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2003-03-02 03:40:25

by Horst H. von Brand

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Dan Kegel <[email protected]> said:

[...]

> Smashing! However, it should probably avoid correcting spellings
> in anything but C comments.

Right.

> Perhaps my C comment parser should be converted to perl and
> incorporated into spell-fix.pl, and used to divide the source
> file into two streams (comment and noncomment); the comment
> stream would be spell-fixed and merged back with the noncomment
> stream to create the output.

I wouldn't go that far. Better give a list of speling mistakes (file/line)
and fix them by hand. It won't need to be done more than occasionally, so
the overhead is not too bad.
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2003-03-02 03:45:29

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 2003-03-01 at 20:02, Dan Kegel wrote:
> My corrections file is up at http://www.kegel.com/spell-fix-dan1.txt
[snip]
>
> Any other changes people want to see in the script
> or the corrections file? Should I add fixes for
> uncommon errors (those that happen only in one or two files)?

Correction:
transmitting=transmiting
triggered=tiggered,triggerred
trigging=triggerg
^^^^^^^^
This should be "triggering" here (I hope).

[steven@spc5 linux]$ find . -type f | xargs grep triggerg
./sound/isa/sb/emu8000_callback.c: for triggerg the voice */
./sound/isa/sb/emu8000_pcm.c: for triggerg the voice */
./sound/pci/emu10k1/emu10k1_callback.c: for triggerg the voice */

Steven

2003-03-02 04:07:15

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, 2003-03-01 at 20:02, Dan Kegel wrote:
> My corrections file is up at http://www.kegel.com/spell-fix-dan1.txt

>
> Any other changes people want to see in the script
> or the corrections file?

Another correction to the corrections file:

Licensed=Licenced
^^^^^^^^
I think Licenced is OK in the UK.
See http://www.gsu.edu/~wwwesl/egw/jones/differences.htm

Steven


2003-03-02 07:43:21

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Steven Cole wrote:
> trigging=triggerg
> ^^^^^^^^
> This should be "triggering" here (I hope).

Right, thanks. (I had that one right, once, but must have
dropped it on the floor.)

> Hmm, psuedo didn't get caught. Is psuedo code particularly smooth?

:-)

OK, http://www.kegel.com/spell-fix-dan2.txt is up, with the
following changes:

78d77
< decrementor=decrementer
158d156
< Licensed=Licenced
198a197
> pseudo=psuedo
271c270
< trigging=triggerg
---
> triggering=triggerg

The above covers errors in three or more source files.

The next logical step was to do the words misspelled in exactly
two source files. I did gather a list of them
( http://www.kegel.com/errors2.txt ) but
I don't have the energy to make a corrections file for those
right now. (FWIW, the procedure is: copy to a new file,
run aspell and consult original source, then use 'paste'
to join input and output of aspell into a two-column file)
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-02 08:01:00

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

jw schultz <jw () pegasys ! ws> wrote:

> An ispell filter seems a simpler approach to me. (ispell -F
> filter) I use that (shown here to head off requests) for
> email so quoted content is ignored. A similar filter for C
> source would make this trivial.
>
> $ grep ispell .muttrc
> set ispell="ispell -F maildequote"
> $ cat bin/maildequote
> #!/usr/bin/perl
>
> while (<STDIN>)
> {
> /^[>|] / || /^On .* wrote:$/ and tr[A-Za-z][_];
> print $_;
> }
> print "\004";

Integrating in to existing spellcheckers is a Good Idea,
though it might not totally replace the perl script Matthias
wrote (does ispell have a batch mode that works on whole
directory trees?).

BTW, ispell on my system is gnu aspell,
and I couldn't tell for the life of me from the manual
whether it supports this kind of filter.
Nor could I find any doc on ispell filters.
Where's the best place to learn about 'em?
- Dan



--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-02 07:59:00

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sat, Mar 01, 2003 at 07:02:00PM -0800, Dan Kegel wrote:
> My corrections file is up at http://www.kegel.com/spell-fix-dan1.txt
> and the patch that produces is
> http://www.kegel.com/linux-2.5.63-bk5-spell.patch.bz2.bin
> The perl script took about an hour of 450MHz cpu time.
> (Might be worth adding a quick path to detect and skip
> files with none of the misspelled words. Or just run
> on a fast machine...)

OK. Next Take.

Changes this time:
- A bug-fix for "--dir" (Would have checked all files)
- Added a "fast-path" but this doesn't seem to make a difference

New options:
- "--[no]fix" to fix (default) or only look for errors.
(This ignores the '[no]comment'-option and looks for all errors!)
- "--[no]override" to override(default) the original file or create a
"<filename>.fixed"-file


Anyone wants a "--[no]ask"-option?




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2003-03-02 08:03:30

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, Mar 02, 2003 at 09:09:10AM +0100, Matthias Schniedermeyer wrote:
> On Sat, Mar 01, 2003 at 07:02:00PM -0800, Dan Kegel wrote:
> > My corrections file is up at http://www.kegel.com/spell-fix-dan1.txt
> > and the patch that produces is
> > http://www.kegel.com/linux-2.5.63-bk5-spell.patch.bz2.bin
> > The perl script took about an hour of 450MHz cpu time.
> > (Might be worth adding a quick path to detect and skip
> > files with none of the misspelled words. Or just run
> > on a fast machine...)
>
> OK. Next Take.
>
> Changes this time:
> - A bug-fix for "--dir" (Would have checked all files)
> - Added a "fast-path" but this doesn't seem to make a difference
>
> New options:
> - "--[no]fix" to fix (default) or only look for errors.
> (This ignores the '[no]comment'-option and looks for all errors!)
> - "--[no]override" to override(default) the original file or create a
> "<filename>.fixed"-file
>
>
> Anyone wants a "--[no]ask"-option?

Earlier or later the "missing-attachment"-thing must happen to anyone.
:-)




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.


Attachments:
(No filename) (1.28 kB)
spell-fix.pl (5.55 kB)
Download all attachments

2003-03-02 08:30:32

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, Mar 02, 2003 at 12:21:47AM -0800, Dan Kegel wrote:
> jw schultz <jw () pegasys ! ws> wrote:
>
> >An ispell filter seems a simpler approach to me. (ispell -F
> >filter) I use that (shown here to head off requests) for
> >email so quoted content is ignored. A similar filter for C
> >source would make this trivial.
> >
> >$ grep ispell .muttrc
> >set ispell="ispell -F maildequote"
> >$ cat bin/maildequote
> >#!/usr/bin/perl
> >
> >while (<STDIN>)
> >{
> > /^[>|] / || /^On .* wrote:$/ and tr[A-Za-z][_];
> > print $_;
> >}
> >print "\004";
>
> Integrating in to existing spellcheckers is a Good Idea,
> though it might not totally replace the perl script Matthias
> wrote (does ispell have a batch mode that works on whole
> directory trees?).
>
> BTW, ispell on my system is gnu aspell,
> and I couldn't tell for the life of me from the manual
> whether it supports this kind of filter.
> Nor could I find any doc on ispell filters.
> Where's the best place to learn about 'em?

The manpage was my only reference. It was enough:

The -F switch specifies an external deformatter program.
This program should read data from its standard input and
write to its standard output. The program must produce
exactly one character of output for each character of
input, or ispell will lose synchronization and corrupt the
output file. Whitespace characters (especially blanks,
tabs, and newlines) and characters that should be spell-
checked should be passed through unchanged. Characters
that should not be spell-checked should be converted into
blanks or other non-word characters. For example, an HTML
deformatter might turn all HTML tags into blanks, and also
blank out all text delimited by tags such as "code" or
"kbd".

I don't know if aspell has filter support. I'm running
International Ispell Version 3.2.06 08/01/01
It came standard on SuSE.
http://fmg-www.cs.ucla.edu/geoff/ispell.html
It isn't GPL but the license terms are not unacceptable.

--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2003-03-02 09:04:38

by John Bradford

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

> > >This versions defaults to only correct words within a comment. ...
> > >// Comments are easy(tm). "Everything after // until line-end".
> > >
> > >and /* ... */ are easy(tm) too because gcc doesn't support to nest them.
> >
> > I'll be damned. I'm impressed with how easy that was in perl.
>
> As long as there is no nesting involved most things a easy/trivial to
> achieve with REs.

Does it cope with:

main ()
{
// /*
printf ("hello world");
// */
}

though?

John.

2003-03-02 09:21:14

by Matthias Schniedermeyer

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, Mar 02, 2003 at 09:15:42AM +0000, John Bradford wrote:
> > > >This versions defaults to only correct words within a comment. ...
> > > >// Comments are easy(tm). "Everything after // until line-end".
> > > >
> > > >and /* ... */ are easy(tm) too because gcc doesn't support to nest them.
> > >
> > > I'll be damned. I'm impressed with how easy that was in perl.
> >
> > As long as there is no nesting involved most things a easy/trivial to
> > achieve with REs.
>
> Does it cope with:
>
> main ()
> {
> // /*
> printf ("hello world");
> // */
> }
>
> though?

No. I could fix this, but i don't think it's worth it.




Bis denn

--
Real Programmers consider "what you see is what you get" to be just as
bad a concept in Text Editors as it is in women. No, the Real Programmer
wants a "you asked for it, you got it" text editor -- complicated,
cryptic, powerful, unforgiving, dangerous.

2003-03-02 11:11:50

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, 2003-03-02 at 04:16, Steven Cole wrote:

> Another correction to the corrections file:
>
> Licensed=Licenced
> ^^^^^^^^
> I think Licenced is OK in the UK.
> See http://www.gsu.edu/~wwwesl/egw/jones/differences.htm

'Licenced' is not OK in the UK; it should be corrected to 'Licensed'.

In the UK, 'licence' is a noun, 'license' is a verb -- just as with
practice/practise and advice/advise etc. in both variants of the
language.

I think we also want to add:

Decompressing=Uncompressing

You should also refrain from 'correcting' the already-correct British
spellings of 'modelled'.

It might also be worth adding a list of 'suspect' spellings -- which
require human intervention. Such items might include 'indices=indexes'
and 'erratum=errata' although you can't do it automatically because
sometimes the right-hand side is actually correct.

--
dwmw2

2003-03-02 13:40:45

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, 2003-03-02 at 04:21, David Woodhouse wrote:
> On Sun, 2003-03-02 at 04:16, Steven Cole wrote:
>
> > Another correction to the corrections file:
> >
> > Licensed=Licenced
> > ^^^^^^^^
> > I think Licenced is OK in the UK.
> > See http://www.gsu.edu/~wwwesl/egw/jones/differences.htm
>
> 'Licenced' is not OK in the UK; it should be corrected to 'Licensed'.
>
> In the UK, 'licence' is a noun, 'license' is a verb -- just as with
> practice/practise and advice/advise etc. in both variants of the
> language.

Thanks for the explanation.

>
> I think we also want to add:
>
> Decompressing=Uncompressing
>
> You should also refrain from 'correcting' the already-correct British
> spellings of 'modelled'.
>
> It might also be worth adding a list of 'suspect' spellings -- which
> require human intervention. Such items might include 'indices=indexes'
> and 'erratum=errata' although you can't do it automatically because
> sometimes the right-hand side is actually correct.

In my first pass through the tree, it looks like there are quite a few
_correct_ uses of errata, but there indeed some of these:

./drivers/net/tulip/de2104x.c: /* Avoid a chip errata by prefixing a dummy entry. */

I think the errata/erratum issue requires careful editing.

Steven




2003-03-02 14:44:55

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, 2003-03-02 at 13:49, Steven Cole wrote:

> > It might also be worth adding a list of 'suspect' spellings -- which
> > require human intervention.

> In my first pass through the tree, it looks like there are quite a few
> _correct_ uses of errata, but there indeed some of these:
>
> ./drivers/net/tulip/de2104x.c: /* Avoid a chip errata by prefixing a dummy entry. */
>
> I think the errata/erratum issue requires careful editing.

Indeed -- that's my point. It's 'suspect' but not necessarily wrong.
Likewise 'indexes' which can be permissible too, when used as a verb,
but is more likely to just be a thinko for 'indices'.

--
dwmw2

2003-03-02 15:14:56

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

David Woodhouse wrote:
> On Sun, 2003-03-02 at 04:16,
> 'Licenced' is not OK in the UK; it should be corrected to 'Licensed'.
> In the UK, 'licence' is a noun, 'license' is a verb -- just as with
> practice/practise and advice/advise etc. in both variants of the
> language.

Thanks for the info. BTW it looks like
http://dictionary.cambridge.org/
is a good authority on whether a word is legal -- and now I understand
why it liked 'licence' but not 'licenced'.

> I think we also want to add:
>
> Decompressing=Uncompressing

I'd prefer to leave that one alone, it seems innocent enough for me.

> You should also refrain from 'correcting' the already-correct British
> spellings of 'modelled'.

OK. Any anti-british corrections are not by intent!

> It might also be worth adding a list of 'suspect' spellings -- which
> require human intervention. Such items might include 'indices=indexes'
> and 'erratum=errata' although you can't do it automatically because
> sometimes the right-hand side is actually correct.

Might be, but I'll leave that for another day. I'd rather focus
on correcting the uncontroversial and obvious howlers.

- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-02 16:51:30

by Jared Daniel J. Smith

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Regarding these two cautious comments:

==========================================================================
I wouldn't go that far. Better give a list of speling mistakes (file/line)
and fix them by hand. It won't need to be done more than occasionally, so
the overhead is not too bad. --Dr. Horst H. von Brand

It might also be worth adding a list of 'suspect' spellings -- which
require human intervention. Such items might include 'indices=indexes'
and 'erratum=errata' although you can't do it automatically because
sometimes the right-hand side is actually correct. --David Woodhouse
==========================================================================

I fully agree.

I have tried to automatically spell-check long, complex texts for years,
with numerous algorithms; all of them fail for one reason or another,
and I find that the only proper way to do it is the tedious work by hand.

Even a single lost pun because of overenthusiastic spellchecking is
not worth the cleanup. I would prefer to see typos than lose a single
intentional 'misspelling'. It would be best if you posted all changes
somewhere so that they could be verified manually.

Consider the following:

alignment=alignement
alignmement is French; is this intentional?

constants=konstants
konstants is German; is this intentional?

consumer=comsumer
comsumer is a neologism: http://www.firstmonday.dk/issues/issue5_5/henshall/

Converted=Coverted
is it a pun on something 'hidden' or is it something transformed?

descriptor=decriptor,desciptor
is it descriptor or decrypter?

invocation=invokation
invokation is German; is this intentional?

negative=negativ
negativ is a legitimate non-English word; is this intentional?

signaled=signalled
signaling=Signalling
signaling=signalling
signalled is a legitimate alternate spelling of signaled.

succeeded=succeded
succeded could also be a typo for 'succeed'

through=throught,throuth
throught could also be a typo for 'thought'

writable=writeable
writeable is a legitimate alternate spelling of writable

Thank you,

-Jared


2003-03-02 17:11:47

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

"Jared Daniel J. Smith" <[email protected]> wrote:
[...]
>constants=konstants
>konstants is German; is this intentional?
[...]
>invocation=invokation
>invokation is German; is this intentional?

Both may look German (since they have the 'k' instead of an 'c') but
they seem more like Germanisms:
'constant' is (as a noun) "Konstante", the plural is "Konstanten"
(see e.g. http://dict.leo.org/?p=T8PXU.&search=constants).
'invokation' is absolutely not-existing in German, see
http://dict.leo.org/?p=T8PXU.&search=invocation. In case of function
one would use "Aufruf" (similar to "call a function").

Bernd, getting somewhat off-topic
--
Bernd Petrovitsch Email : [email protected]
g.a.m.s gmbh Fax : +43 1 205255-900
Prinz-Eugen-Stra?e 8 A-1040 Vienna/Austria/Europe
LUGA : http://www.luga.at


2003-03-02 17:36:43

by Werner Almesberger

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Bernd Petrovitsch wrote:
> 'invokation' is absolutely not-existing in German, see
> http://dict.leo.org/?p=T8PXU.&search=invocation. In case of function
> one would use "Aufruf" (similar to "call a function").

That's not entirely true. If you capitalize it, it means the act of
calling upon your God(s), i.e. something a distressed programmer may
in fact end up doing ;-)

- Werner

--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina [email protected] /
/_http://www.almesberger.net/____________________________________________/

2003-03-02 18:17:45

by Bernd Petrovitsch

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Werner Almesberger <[email protected]> wrote:
>Bernd Petrovitsch wrote:
>> 'invokation' is absolutely not-existing in German, see
>> http://dict.leo.org/?p=T8PXU.&search=invocation. In case of function
>> one would use "Aufruf" (similar to "call a function").
>
>That's not entirely true. If you capitalize it, it means the act of
>calling upon your God(s), i.e. something a distressed programmer may
>in fact end up doing ;-)

ACK (as usual, Werner is right):
http://www.xipolis.net/03411d6535c36372d9c8396ca394a93d/suche/abstract.php?shortname=felix&artikel_id=80254
Never heard of it before, though.

[ sorry for the long broken URL ]

Bernd

--
Bernd Petrovitsch Email : [email protected]
g.a.m.s gmbh Fax : +43 1 205255-900
Prinz-Eugen-Stra?e 8 A-1040 Vienna/Austria/Europe
LUGA : http://www.luga.at


2003-03-02 18:37:28

by Steven Cole

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, 2003-03-02 at 11:56, Jared Daniel J. Smith wrote:
> Regarding these two cautious comments:
>
> ==========================================================================
> I wouldn't go that far. Better give a list of speling mistakes (file/line)
> and fix them by hand. It won't need to be done more than occasionally, so
> the overhead is not too bad. --Dr. Horst H. von Brand
>
> It might also be worth adding a list of 'suspect' spellings -- which
> require human intervention. Such items might include 'indices=indexes'
> and 'erratum=errata' although you can't do it automatically because
> sometimes the right-hand side is actually correct. --David Woodhouse
> ==========================================================================
>
> I fully agree.
>
> I have tried to automatically spell-check long, complex texts for years,
> with numerous algorithms; all of them fail for one reason or another,
> and I find that the only proper way to do it is the tedious work by hand.
>
> Even a single lost pun because of overenthusiastic spellchecking is
> not worth the cleanup. I would prefer to see typos than lose a single
> intentional 'misspelling'. It would be best if you posted all changes
> somewhere so that they could be verified manually.

More agreement. In this case it's better to commit a sin of omission
than one of commission. In my recent cleanups, here are three cases
which were left alone:

./arch/sparc/kernel/head.S: * Sun people can't spell worth damn. "compatability" indeed.
./drivers/net/myri_sbus.h: u32 shakedown; /* DarkkkkStarrr Crashesss... */
./drivers/scsi/FlashPoint.c: return(0); /*We WON! Yeeessss! */

>
> Consider the following:
>
[snip]
>
> Converted=Coverted
> is it a pun on something 'hidden' or is it something transformed?
>
./drivers/media/radio/radio-aimslab.c: * Coverted to new API by Alan Cox <[email protected]>
./drivers/media/radio/radio-gemtek.c: * Coverted to new API by Alan Cox <[email protected]>
./drivers/media/radio/radio-rtrack2.c: * Coverted to new API by Alan Cox <[email protected]>

Alan's humor can be subtle. Better to ask him. AC added to cc list.
I just hope he doesn't start to pun "yn Cymraeg"

Steven


2003-03-02 21:18:11

by Alan

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

On Sun, 2003-03-02 at 18:46, Steven Cole wrote:
> ./drivers/media/radio/radio-aimslab.c: * Coverted to new API by Alan Cox <[email protected]>
> ./drivers/media/radio/radio-gemtek.c: * Coverted to new API by Alan Cox <[email protected]>
> ./drivers/media/radio/radio-rtrack2.c: * Coverted to new API by Alan Cox <[email protected]>
>
> Alan's humor can be subtle. Better to ask him. AC added to cc list.

Cut & waste accident. Those should be fixed

2003-03-02 21:31:25

by Alan

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker


> > I think we also want to add:
> >
> > Decompressing=Uncompressing

Both are commonly used. People are going to far. Fixing typos that are
confusing or blatantly daft is one thing, but if you want to pick over
documentation line by line with a copy of Fowlers in hand the Gnome and
KDE projects would both love to have you working over their
documentation and end user manuals ;)


2003-03-02 22:49:14

by John Bradford

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

> > > I think we also want to add:
> > >
> > > Decompressing=Uncompressing
>
> Both are commonly used.

To me, 'decompressed' suggests that something was definitely once
compressed, whereas 'uncompressed' suggests that it may never have been.

> People are going to far.

I totally agree. The possibility for introducing more errors is growing.

> Fixing typos that are confusing or blatantly daft is one thing

Things like teh instead of the are easily corrected, and it's useful
for grepping through the kernel source.

> but if you want to pick over documentation line by line with a copy
> of Fowlers

What _would_ be useful, would be a script to validate all of the email
addresses in comments in the kernel source. I found a typo in an
E-Mail address once, and I'm sure there are probably more.

John.

2003-03-03 02:08:16

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Alan Cox wrote:
>>>I think we also want to add:
>>>
>>>Decompressing=Uncompressing
>
>
> Both are commonly used. People are going to far. Fixing typos that are
> confusing or blatantly daft is one thing, but if you want to pick over
> documentation line by line with a copy of Fowlers in hand the Gnome and
> KDE projects would both love to have you working over their
> documentation and end user manuals ;)

Agreed. Confusing and blatantly daft typos are my intended target.

I've put all of my scripts and data up at
http://www.kegel.com/kerspell
including an ispell filter (haven't tried it out yet)
and a stopword list.

Here's example output from my lspell.pl script using my stopword list:

linux-2.5.63-bk5.old/include/asm-s390x/atomic.h: 1

enviroment

linux-2.5.63-bk5.old/include/asm-s390x/rwsem.h: 1

consequtive

linux-2.5.63-bk5.old/include/asm-s390x/dasd.h: 3

featueres Perfomance requests's

linux-2.5.63-bk5.old/include/asm-s390x/pgtable.h: 3

lenght regiontable specifiation

...


- Dan


--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-03 03:33:24

by Steven Cole

[permalink] [raw]
Subject: [PATCH] 2.5.63-current fix Coverted -> Converted , was (Re: [PATCH] kernel source spellchecker)

This patch fixes what might have been a joke, but wasn't.

Coverted -> Converted

One down, 285 to go.

Steven

On Sun, 2003-03-02 at 15:32, Alan Cox wrote:
> On Sun, 2003-03-02 at 18:46, Steven Cole wrote:
> > ./drivers/media/radio/radio-aimslab.c: * Coverted to new API by Alan Cox <[email protected]>
> > ./drivers/media/radio/radio-gemtek.c: * Coverted to new API by Alan Cox <[email protected]>
> > ./drivers/media/radio/radio-rtrack2.c: * Coverted to new API by Alan Cox <[email protected]>
> >
> > Alan's humor can be subtle. Better to ask him. AC added to cc list.
>
> Cut & waste accident. Those should be fixed
>
>

diff -ur bk-current/drivers/media/radio/radio-aimslab.c linux/drivers/media/radio/radio-aimslab.c
--- bk-current/drivers/media/radio/radio-aimslab.c Sun Mar 2 20:12:31 2003
+++ linux/drivers/media/radio/radio-aimslab.c Sun Mar 2 20:30:04 2003
@@ -1,6 +1,6 @@
/* radiotrack (radioreveal) driver for Linux radio support
* (c) 1997 M. Kirkwood
- * Coverted to new API by Alan Cox <[email protected]>
+ * Converted to new API by Alan Cox <[email protected]>
* Various bugfixes and enhancements by Russell Kroll <[email protected]>
*
* History:
diff -ur bk-current/drivers/media/radio/radio-gemtek.c linux/drivers/media/radio/radio-gemtek.c
--- bk-current/drivers/media/radio/radio-gemtek.c Sun Mar 2 20:11:13 2003
+++ linux/drivers/media/radio/radio-gemtek.c Sun Mar 2 20:30:26 2003
@@ -8,7 +8,7 @@
* RadioTrack II driver for Linux radio support (C) 1998 Ben Pfaff
*
* Based on RadioTrack I/RadioReveal (C) 1997 M. Kirkwood
- * Coverted to new API by Alan Cox <[email protected]>
+ * Converted to new API by Alan Cox <[email protected]>
* Various bugfixes and enhancements by Russell Kroll <[email protected]>
*
* TODO: Allow for more than one of these foolish entities :-)
diff -ur bk-current/drivers/media/radio/radio-rtrack2.c linux/drivers/media/radio/radio-rtrack2.c
--- bk-current/drivers/media/radio/radio-rtrack2.c Sun Mar 2 20:14:08 2003
+++ linux/drivers/media/radio/radio-rtrack2.c Sun Mar 2 20:30:36 2003
@@ -1,7 +1,7 @@
/* RadioTrack II driver for Linux radio support (C) 1998 Ben Pfaff
*
* Based on RadioTrack I/RadioReveal (C) 1997 M. Kirkwood
- * Coverted to new API by Alan Cox <[email protected]>
+ * Converted to new API by Alan Cox <[email protected]>
* Various bugfixes and enhancements by Russell Kroll <[email protected]>
*
* TODO: Allow for more than one of these foolish entities :-)

2003-03-03 05:15:14

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

"Jared Daniel J. Smith" wrote:
> Even a single lost pun because of overenthusiastic spellchecking is
> not worth the cleanup. I would prefer to see typos than lose a single
> intentional 'misspelling'. It would be best if you posted all changes
> somewhere so that they could be verified manually.
>
> Consider the following:
>
> alignment=alignement
> alignmement is French; is this intentional?

No. All three instances were english typos.

> constants=konstants
> konstants is German; is this intentional?

No. All three instances were english typos.

> consumer=comsumer
> comsumer is a neologism: http://www.firstmonday.dk/issues/issue5_5/henshall/

That may be, but in the neologism, it seems to be usually partially capitalized,
and the C source sure looks like just a typo:
/* producer/comsumer pointers for Tx/Rx ring */

> Converted=Coverted
> is it a pun on something 'hidden' or is it something transformed?

Allan said it was a copy/paste error and should be fixed.

> descriptor=decriptor,desciptor
> is it descriptor or decrypter?

You be the judge:
/* Initiliaze Transmit/Receive decriptor and CR3/4 */

All instances I saw were just english typos.

> invocation=invokation
> invokation is German; is this intentional?

No idea, seems to be gone in the current kernel source?

> negative=negativ
> negativ is a legitimate non-English word; is this intentional?

Where I spot-checked it, it was always just an English typo.

> signaled=signalled
> signaling=Signalling
> signaling=signalling
> signalled is a legitimate alternate spelling of signaled.

Thanks, fixed!

> succeeded=succeded
> succeded could also be a typo for 'succeed'
>
> through=throught,throuth
> throught could also be a typo for 'thought'

Yes. These will have to be hand-reviewed. I do recommend absolutely
every change be hand-reviewed just in case.

> writable=writeable
> writeable is a legitimate alternate spelling of writable

You're right, though I had to dig to find a dictionary that agreed with you.

I've updated http://www.kegel.com/kerspell to remove the "signall*" and "writeable"
corrections. (My stoplist already listed them as acceptable, fwiw.)

Thanks!
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-03 05:27:28

by Dan Kegel

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Steven Cole wrote:
> BTW, I ran the spell-fix.pl script using only the first 10 entries
> of spell-fix.txt (an arbitrary choice), and got a pretty big diff:
>
> 132 files changed, 199 insertions(+), 199 deletions(-)
>
> My feeling is that patches should be about 1/4 that size.
> Otherwise, Linus may /dev/null them. Second opinion anyone?

I agree with the 1/4th, but in # of lines of spell-fix.txt, not
in output size.

Looking at the changesets he's accepted, it looks like he's
comfortable with changesets of 100 files (see "don't" fixes at
http://www.kernel.org/pub/linux/kernel/v2.5/testing/cset/cset-1.1025.1.38.txt )
It probably helps that this was a single kind of change.

My guess is these things need enough manual reviewing
that keeping it down to just a related group of fixes per patch
is a good idea (e.g.
Acknowledge=Acknowlege
acknowledged=acknoledged
together are a Good Idea, more is probably bad).

BTW Linus has been accepting so many spell fixes it's probably important
to work with very fresh sources...
- Dan

--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045

2003-03-07 10:58:27

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] kernel source spellchecker

Hi!

> > Here is a list of corrections...I have omitted those that seem OK to me,
> > apostrophes, proper names, some that seem to be hypenation-related, American
> > vs. British differences and a few others.
> >
> > In the case of broken American spelling, I have provided American fixes
> > (against my better judgement :-)). Enjoy...
> >
> > brain-damaged=dain-bramaged,dain bramaged
>
> What's wrong with those? Yes, they don't exist in a dictionary, but
> neither do a

Please leave those in. Don't kill jokes
just for "spell checking".

These obviously are NOT typos; author
wanted it that way.

Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...