2001-02-27 13:05:28

by Ivo Timmermans

[permalink] [raw]
Subject: binfmt_script and ^M

When running a script (perl in this case) that has DOS-style newlines
(\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
recognize the \r. The following patch should fix this (untested).

Please Cc me on replies, I'm not on this list. Thanks.


--- binfmt_script.c~ Mon Feb 26 17:42:09 2001
+++ binfmt_script.c Tue Feb 27 13:39:47 2001
@@ -36,6 +36,8 @@
bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
if ((cp = strchr(bprm->buf, '\n')) == NULL)
cp = bprm->buf+BINPRM_BUF_SIZE-1;
+ if (cp - 1 == '\r')
+ cp--;
*cp = '\0';
while (cp > bprm->buf) {
cp--;

--
Ivo Timmermans


2001-02-27 13:34:38

by Heusden, Folkert van

[permalink] [raw]
Subject: RE: binfmt_script and ^M

> When running a script (perl in this case) that has DOS-style newlines
> (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> recognize the \r. The following patch should fix this (untested).

_should_ it work with the \r in it?

There might be a problem with your patch: at the '*)': if the '\n' is the
first character on the line, the cp-1 (which should be *(cp-1) I think)
would point before the buffer which can be un-allocated memory.



--- binfmt_script.c~ Mon Feb 26 17:42:09 2001
+++ binfmt_script.c Tue Feb 27 13:39:47 2001
@@ -36,6 +36,8 @@
bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
if ((cp = strchr(bprm->buf, '\n')) == NULL)
cp = bprm->buf+BINPRM_BUF_SIZE-1;
+ if (cp - 1 == '\r') <------- *)
+ cp--;
*cp = '\0';
while (cp > bprm->buf) {
cp--;


Greetings,
Folkert van Heusden
[ http://www.vanheusden.com ]

2001-02-27 13:38:49

by Ivo Timmermans

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Heusden, Folkert van wrote:
> > When running a script (perl in this case) that has DOS-style newlines
> > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > recognize the \r. The following patch should fix this (untested).
>
> _should_ it work with the \r in it?

IMHO, yes. This set of files were created on Windows, then zipped and
uploaded to a Linux server, unpacked. This does not change the \r.

> There might be a problem with your patch: at the '*)': if the '\n' is the
> first character on the line, the cp-1 (which should be *(cp-1) I think)

You're right there.

> would point before the buffer which can be un-allocated memory.

No, the first two characters are always `#!'.

> + if (cp - 1 == '\r') <------- *)


--
Ivo Timmermans

2001-02-27 13:43:29

by Alan

[permalink] [raw]
Subject: Re: binfmt_script and ^M

> When running a script (perl in this case) that has DOS-style newlines
> (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> recognize the \r. The following patch should fix this (untested).

Fix the script. The kernel expects a specific format

Alan

2001-02-27 13:45:39

by Heusden, Folkert van

[permalink] [raw]
Subject: RE: binfmt_script and ^M

> > When running a script (perl in this case) that has DOS-style newlines
> > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > recognize the \r. The following patch should fix this (untested).
> _should_ it work with the \r in it?
IV> IMHO, yes. This set of files were created on Windows, then zipped and
IV> uploaded to a Linux server, unpacked. This does not change the \r.

But; it's not that much of hassle to run it trough some awk/sed/whatsoever
script, would it? Imho there should be as less as possible code in the
kernel which could've also been done in user-space.

> + if (cp - 1 == '\r') <------- *)
> There might be a problem with your patch: at the '*)': if the '\n' is the
> first character on the line, the cp-1 (which should be *(cp-1) I think)
IV> You're right there.

Phew, then I have at least 1 thing right in my message since I was wrong
with:

> would point before the buffer which can be un-allocated memory.

If only I had read the code myself :o)

IV> No, the first two characters are always `#!'.

Yes, absolutely right.

2001-02-27 13:48:19

by Ivo Timmermans

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Alan Cox wrote:
> > When running a script (perl in this case) that has DOS-style newlines
> > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > recognize the \r. The following patch should fix this (untested).
>
> Fix the script. The kernel expects a specific format

For what reason? Is it a standard to not allow it, or does it break
other things?


--
Ivo Timmermans

2001-02-27 13:52:09

by Alan

[permalink] [raw]
Subject: Re: binfmt_script and ^M

> > > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > > recognize the \r. The following patch should fix this (untested).
> >
> > Fix the script. The kernel expects a specific format
>
> For what reason? Is it a standard to not allow it, or does it break
> other things?

The line terminator is \n so if you have

#!/usr/bin/perl\r\n

Then the command to run is "/usr/bin/perl\r" - and \r is a valid file name
component

2001-02-27 14:04:55

by Bruce Harada

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, 27 Feb 2001 14:38:23 +0100
Ivo Timmermans <[email protected]> wrote:
> Heusden, Folkert van wrote:
> > > When running a script (perl in this case) that has DOS-style
> newlines
> > > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > > recognize the \r. The following patch should fix this (untested).
> >
> > _should_ it work with the \r in it?
>
> IMHO, yes. This set of files were created on Windows, then zipped and
> uploaded to a Linux server, unpacked. This does not change the \r.

Unzipping the files with the "-ll" option should fix that. There's no
particular reason why the kernel should handle CR+LF; LF has been the
end-of-line character for UN*X systems since Adam was a cowboy.
Changing it now would only lead to a situation where some things would
work with CR+LF and others wouldn't. Let's keep it simple...

--
Bruce Harada
[email protected]

2001-02-27 14:26:58

by Alistair Riddell

[permalink] [raw]
Subject: RE: binfmt_script and ^M

On Tue, 27 Feb 2001, Heusden, Folkert van wrote:

> But; it's not that much of hassle to run it trough some awk/sed/whatsoever
> script, would it? Imho there should be as less as possible code in the

man fromdos (on most linux systems anyway)

--
Alistair Riddell - BOFH
IT Support Department, George Watson's College, Edinburgh
Tel: +44 131 447 7931 Ext 176 Fax: +44 131 452 8594
Microsoft - because god hates us

2001-02-27 14:37:18

by Rogier Wolff

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Alan Cox wrote:
> > > > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > > > recognize the \r. The following patch should fix this (untested).
> > >
> > > Fix the script. The kernel expects a specific format
> >
> > For what reason? Is it a standard to not allow it, or does it break
> > other things?
>
> The line terminator is \n so if you have
>
> #!/usr/bin/perl\r\n
>
> Then the command to run is "/usr/bin/perl\r" - and \r is a valid file name
> component

Agreed. If you insist "fix" it with.....

cd /usr/bin
ln -s perl perl\r

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2001-02-27 19:21:36

by Jamie Lokier

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Ivo Timmermans wrote:
> > _should_ it work with the \r in it?
>
> IMHO, yes. This set of files were created on Windows, then zipped and
> uploaded to a Linux server, unpacked. This does not change the \r.

Use `fromdos' to convert the files. Or this little Perl gem, which
takes a list of files or standard input as argument:

#!/usr/bin/perl -pi
s/\r\n$/\n/

-- Jamie

2001-02-27 20:13:22

by Don Dugger

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Isn't `perl' overkill? Why not just:

tr -d '\r'

On Tue, Feb 27, 2001 at 08:20:59PM +0100, Jamie Lokier wrote:
> Ivo Timmermans wrote:
> > > _should_ it work with the \r in it?
> >
> > IMHO, yes. This set of files were created on Windows, then zipped and
> > uploaded to a Linux server, unpacked. This does not change the \r.
>
> Use `fromdos' to convert the files. Or this little Perl gem, which
> takes a list of files or standard input as argument:
>
> #!/usr/bin/perl -pi
> s/\r\n$/\n/
>
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
[email protected]
Ph: 303/938-9838

2001-02-27 21:35:58

by Rogier Wolff

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Ivo Timmermans wrote:
> Heusden, Folkert van wrote:
> > > When running a script (perl in this case) that has DOS-style newlines
> > > (\r\n), Linux 2.4.2 can't find an interpreter because it doesn't
> > > recognize the \r. The following patch should fix this (untested).
> >
> > _should_ it work with the \r in it?
>
> IMHO, yes. This set of files were created on Windows, then zipped and
> uploaded to a Linux server, unpacked. This does not change the \r.

Use the right option on "unzip" to unpack with cr/lf conversion.

Otherwise, use a script that does it afterwards.

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2001-02-27 21:36:51

by Tim Waugh

[permalink] [raw]
Subject: [OT] Re: binfmt_script and ^M

On Tue, Feb 27, 2001 at 12:59:48PM -0700, Don Dugger wrote:

> Isn't `perl' overkill? Why not just:
>
> tr -d '\r'

while read line; do echo ${line%?}; done

2001-02-27 23:16:55

by Jamie Lokier

[permalink] [raw]
Subject: Re: [OT] Re: binfmt_script and ^M

Tim Waugh wrote:
> > Isn't `perl' overkill? Why not just:
> >
> > tr -d '\r'
>
> while read line; do echo ${line%?}; done

And those can be convert a set of files as "fromdos *.c" can they?

:-)
-- Jamie

2001-02-27 23:24:05

by David Ford

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Alistair Riddell wrote:

> On Tue, 27 Feb 2001, Heusden, Folkert van wrote:
>
>> But; it's not that much of hassle to run it trough some awk/sed/whatsoever
>> script, would it? Imho there should be as less as possible code in the
>
>
> man fromdos (on most linux systems anyway)
>

tr -d '\r' < infile > outfile

We wouldn't make the kernel translate m$ word docs into files the kernel
can parse. It's a userland thing and changing the kernel would change a
legacy that would cause a lot of confusion I would expect.

-d

2001-02-28 14:07:46

by Jamie Lokier

[permalink] [raw]
Subject: Re: binfmt_script and ^M

David wrote:
> We wouldn't make the kernel translate m$ word docs into files the kernel
> can parse. It's a userland thing and changing the kernel would change a
> legacy that would cause a lot of confusion I would expect.

Now there's a thought. binfmt_fileextension, chooses the interpreter
based on filename :-)

-- Jamie

2001-02-28 20:11:19

by Erik Hensema

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, Feb 27, 2001 at 01:44:08PM +0000, Alan Cox wrote:
> > When running a script (perl in this case) that has DOS-style
> > newlines (\r\n), Linux 2.4.2 can't find an interpreter because it
> > doesn't recognize the \r. The following patch should fix this
> > (untested).

> Fix the script. The kernel expects a specific format


How about letting the kernel return ENOEXEC instead of ENOENT? It would
give the luser just the little extra hint about converting their files to
Unix format.

$ ls
testscript
$ head -1 testscript
#!/bin/sh
$ ./testscript
bash: ./testscript: No such file or directory

versus

$ ./testscript
bash: ./testscript: Exec format error

I haven't got a clue what Posix requires though.

--
Erik Hensema ([email protected])

2001-02-28 22:23:58

by H. Peter Anvin

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Followup to: <[email protected]>
By author: Jamie Lokier <[email protected]>
In newsgroup: linux.dev.kernel
>
> David wrote:
> > We wouldn't make the kernel translate m$ word docs into files the kernel
> > can parse. It's a userland thing and changing the kernel would change a
> > legacy that would cause a lot of confusion I would expect.
>
> Now there's a thought. binfmt_fileextension, chooses the interpreter
> based on filename :-)
>

binfmt_misc?

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-03-02 22:04:49

by Pavel Machek

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Hi!

> > > When running a script (perl in this case) that has DOS-style
> > > newlines (\r\n), Linux 2.4.2 can't find an interpreter because it
> > > doesn't recognize the \r. The following patch should fix this
> > > (untested).
>
> > Fix the script. The kernel expects a specific format
>
>
> How about letting the kernel return ENOEXEC instead of ENOENT? It would
> give the luser just the little extra hint about converting their files to
> Unix format.
>
> $ ls
> testscript
> $ head -1 testscript
> #!/bin/sh
> $ ./testscript
> bash: ./testscript: No such file or directory

What kernel wants to say is "/usr/bin/perl\r: no such file". Saying ENOEXEC
would be even more confusing.
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-03-05 13:20:55

by Jan Nieuwenhuizen

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Pavel Machek <[email protected]> writes:

> > $ head -1 testscript
> > #!/bin/sh
> > $ ./testscript
> > bash: ./testscript: No such file or directory
>
> What kernel wants to say is "/usr/bin/perl\r: no such file". Saying ENOEXEC
> would be even more confusing.

So, why don't we make bash say that, then? As I guess that we've all
been bitten by this before.

What are the chances for something like this to be included?

Greetings,
Jan.


--- ../bash-2.04/execute_cmd.c Tue Jan 25 17:29:11 2000
+++ ./execute_cmd.c Mon Mar 5 13:50:23 2001
@@ -3035,6 +3035,42 @@
}
}

+/* Look for #!INTERPRETER in file COMMAND, and return INTERPRETER . */
+static char *
+extract_hash_bang_interpreter (char *command, char buf[80])
+{
+ int fd;
+ char *interpreter;
+
+ interpreter = "";
+ fd = open (command, O_RDONLY);
+ if (fd >= 0)
+ {
+ int len;
+
+ len = read (fd, (char *)buf, 80);
+ close (fd);
+
+ if (len > 0
+ && buf[0] == '#' && buf[1] == '!')
+ {
+ int i;
+ int start;
+
+ for (i = 2; whitespace (buf[i]) && i < len; i++)
+ ;
+
+ for (start = i;
+ !whitespace (buf[i]) && buf[i] != '\n' && i < len;
+ i++)
+ ;
+
+ interpreter = substring ((char *)buf, start, i);
+ }
+ }
+ return interpreter;
+}
+
/* Execute a simple command that is hopefully defined in a disk file
somewhere.

@@ -3155,7 +3191,12 @@

if (command == 0)
{
- internal_error ("%s: command not found", pathname);
+ char buf[80];
+ char *interpreter = extract_hash_bang_interpreter (pathname, buf);
+
+ internal_error ("%s: command not found: `%s'", pathname,
+ interpreter);
+
exit (EX_NOTFOUND); /* Posix.2 says the exit status is 127 */
}



--
Jan Nieuwenhuizen <[email protected]> | GNU LilyPond - The music typesetter
http://www.xs4all.nl/~jantien | http://www.lilypond.org

2001-03-05 13:37:37

by Andreas Schwab

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Jan Nieuwenhuizen <[email protected]> writes:

|> Pavel Machek <[email protected]> writes:
|>
|> > > $ head -1 testscript
|> > > #!/bin/sh
|> > > $ ./testscript
|> > > bash: ./testscript: No such file or directory
^^^^^^^^^^^^^^^^^^^^^^^^^
|> >
|> > What kernel wants to say is "/usr/bin/perl\r: no such file". Saying ENOEXEC
|> > would be even more confusing.
|>
|> So, why don't we make bash say that, then? As I guess that we've all
|> been bitten by this before.
|>
|> What are the chances for something like this to be included?

Very low, because it would not change anything.

|> @@ -3155,7 +3191,12 @@
|>
|> if (command == 0)
|> {
|> - internal_error ("%s: command not found", pathname);
^^^^^^^^^^^^^^^^^
|> + char buf[80];
|> + char *interpreter = extract_hash_bang_interpreter (pathname, buf);
|> +
|> + internal_error ("%s: command not found: `%s'", pathname,
|> + interpreter);
|> +
|> exit (EX_NOTFOUND); /* Posix.2 says the exit status is 127 */
|> }

Andreas.

--
Andreas Schwab "And now for something
SuSE Labs completely different."
[email protected]
SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5

2001-03-05 13:41:06

by Richard B. Johnson

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On 5 Mar 2001, Jan Nieuwenhuizen wrote:

> Pavel Machek <[email protected]> writes:
>
> > > $ head -1 testscript
> > > #!/bin/sh
> > > $ ./testscript
> > > bash: ./testscript: No such file or directory
> >
> > What kernel wants to say is "/usr/bin/perl\r: no such file". Saying ENOEXEC
> > would be even more confusing.
>
> So, why don't we make bash say that, then? As I guess that we've all
> been bitten by this before.
>
> What are the chances for something like this to be included?
>
> Greetings,
> Jan.
>
[SNIPPED...]

So why would you even consider breaking bash as a work-around for
a broken script?

Somebody must have missed the boat entirely. Unix does not, never
has, and never will end a text line with '\r'. It's Microsoft junk
that does that, a throwback to CP/M, a throwback to MDS/200.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-03-05 14:55:36

by John Kodis

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:

> Somebody must have missed the boat entirely. Unix does not, never
> has, and never will end a text line with '\r'.

Unix does not, never has, and never will end a text line with ' ' (a
space character) or with \t (a tab character). Yet if I begin a shell
script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
striped and /bin/sh gets exec'd. Since \r has no special significance
to Unix, I'd expect it to be treated the same as any other whitespace
character -- it should be striped, and /bin/sh should get exec'd.

--
John Kodis <[email protected]>
Phone: 301-286-7376

2001-03-05 14:57:16

by Jan Nieuwenhuizen

[permalink] [raw]
Subject: [PATCH]: print missing interpreter name [Was: Re: binfmt_script and ^M]

"Richard B. Johnson" <[email protected]> writes:

> So why would you even consider breaking bash as a work-around for
> a broken script?

I don't.

> Somebody must have missed the boat entirely. Unix does not, never
> has, and never will end a text line with '\r'. It's Microsoft junk
> that does that, a throwback to CP/M, a throwback to MDS/200.

Yes, we all know that, but you missed the point. As far as this patch
goes, it's got nothing to do with the '\r'. It's meant to get a more
informative error message from bash, if ``#!INTERPRETER'' does not
exist. Look:

$ cat /bin/foo.sh
#!/foo/bar/baz
echo bar
$ /bin/bash -c /bin/foo.sh
/bin/bash: /bin/foo.sh: No such file or directory
$ ./build/bash -c /bin/foo.sh
./build/bash: /foo/bar: No such file or directory

Maybe the message could even be better, but having `/foo/bar' printed,
ie, the file that the kernel says does not exist, iso `/bin/foo.sh',
the name of the script, that certainly does exist, may help. Possibly
both should be printed.

Greetings,
Jan.

I made a silly mistake in previous patch, sorry.

--- ../bash-2.04.orig/ChangeLog Mon Mar 5 13:58:48 2001
+++ ./ChangeLog Mon Mar 5 15:28:19 2001
@@ -0,0 +1,5 @@
+2001-03-05 Jan Nieuwenhuizen <[email protected]>
+
+ * execute_cmd.c (extract_hash_bang_interpreter): New function.
+ (shell_execve): More informative error message.
+
--- ../bash-2.04.orig/execute_cmd.c Tue Jan 25 17:29:11 2000
+++ ./execute_cmd.c Mon Mar 5 15:29:37 2001
@@ -3035,6 +3035,42 @@
}
}

+/* Look for #!INTERPRETER in file COMMAND, and return INTERPRETER . */
+static char *
+extract_hash_bang_interpreter (char *command, char buf[80])
+{
+ int fd;
+ char *interpreter;
+
+ interpreter = "";
+ fd = open (command, O_RDONLY);
+ if (fd >= 0)
+ {
+ int len;
+
+ len = read (fd, (char *)buf, 80);
+ close (fd);
+
+ if (len > 0
+ && buf[0] == '#' && buf[1] == '!')
+ {
+ int i;
+ int start;
+
+ for (i = 2; whitespace (buf[i]) && i < len; i++)
+ ;
+
+ for (start = i;
+ !whitespace (buf[i]) && buf[i] != '\n' && i < len;
+ i++)
+ ;
+
+ interpreter = substring ((char *)buf, start, i);
+ }
+ }
+ return interpreter;
+}
+
/* Execute a simple command that is hopefully defined in a disk file
somewhere.

@@ -3326,6 +3362,11 @@
else
{
errno = i;
+ if (errno == ENOENT)
+ {
+ char buf[80];
+ command = extract_hash_bang_interpreter (command, buf);
+ }
file_error (command);
}
return ((i == ENOENT) ? EX_NOTFOUND : EX_NOEXEC); /* XXX Posix.2 says that exit status is 126 */

--
Jan Nieuwenhuizen <[email protected]> | GNU LilyPond - The music typesetter
http://www.xs4all.nl/~jantien | http://www.lilypond.org

2001-03-05 15:29:43

by Rik van Riel

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, John Kodis wrote:
> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
>
> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'.
>
> Unix does not, never has, and never will end a text line with ' ' (a
> space character) or with \t (a tab character). Yet if I begin a shell
> script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
> striped and /bin/sh gets exec'd. Since \r has no special significance
> to Unix, I'd expect it to be treated the same as any other whitespace
> character -- it should be striped, and /bin/sh should get exec'd.

Makes sense, IMHO...

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2001-03-05 15:51:47

by Richard B. Johnson

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, John Kodis wrote:

> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
>
> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'.
>
> Unix does not, never has, and never will end a text line with ' ' (a
> space character) or with \t (a tab character). Yet if I begin a shell
> script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
> striped and /bin/sh gets exec'd. Since \r has no special significance
> to Unix, I'd expect it to be treated the same as any other whitespace
> character -- it should be striped, and /bin/sh should get exec'd.
>

No. the '\n' character is interpreted. '\t' is expanded on stdout if
the terminal is "cooked". Other interpreted characters are:
^C, ^\, ^U, ^D, ^@, ^Q, ^S, ^Z, ^R, ^O, ^W, ^V

These are all upper-case ASCII letters codes minus 64. The erase (VERASE)
is special and is '?' + 64

The '\r' (^R) definitely has special significance to Unix. It's called
"VREPRINT", in the termios structure member "c_cc". If it exists in
a file instead of an output stream, these characters are not interpreted.
All files in Unix are binary. It's the 'C' runtime library that may
interpret file contents for programmer convenience. For instance,
fgets() reads until the new-line character. If you happen to have a
'\r' before that new-line, guess what? You will have a '\r' in your
string. If you were attempting to do:

ls foo\r

You will get a file-not found error unless you have a file with '\r'
as its last character.

There is really no such thing as "whitespace" in Unix compatible text.
For instance, the text in a Makefile MUST use the tab character as a
separator. Spaces won't do.


Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-03-05 16:01:10

by Richard B. Johnson

[permalink] [raw]
Subject: Re: [PATCH]: print missing interpreter name [Was: Re: binfmt_script and ^M]

On 5 Mar 2001, Jan Nieuwenhuizen wrote:

> "Richard B. Johnson" <[email protected]> writes:
>
> > So why would you even consider breaking bash as a work-around for
> > a broken script?
>
> I don't.
>
> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'. It's Microsoft junk
> > that does that, a throwback to CP/M, a throwback to MDS/200.
>
> Yes, we all know that, but you missed the point. As far as this patch
> goes, it's got nothing to do with the '\r'. It's meant to get a more
> informative error message from bash, if ``#!INTERPRETER'' does not
> exist. Look:
>
> $ cat /bin/foo.sh
> #!/foo/bar/baz
> echo bar
> $ /bin/bash -c /bin/foo.sh
> /bin/bash: /bin/foo.sh: No such file or directory
> $ ./build/bash -c /bin/foo.sh
> ./build/bash: /foo/bar: No such file or directory
>
> Maybe the message could even be better, but having `/foo/bar' printed,
> ie, the file that the kernel says does not exist, iso `/bin/foo.sh',
> the name of the script, that certainly does exist, may help. Possibly
> both should be printed.
>
> Greetings,
> Jan.

No. I did not miss the point. The 'No such file or directory' error
(when you can see the ^$^$)#@@*& filename with 'ls'), usually means
that there is something wrong with the file. Usually, I have found
that it was an executable linked against some other runtime library
than what I have. `strace` finds this quickly.

A common problem after a so-called upgrade. So, the bash error output
(without additional text) is consistent when there is something wrong with
the file-name as well.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-03-05 16:01:09

by Jeff Mcadams

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Also sprach Rik van Riel
>On Mon, 5 Mar 2001, John Kodis wrote:
>> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
>> > Somebody must have missed the boat entirely. Unix does not, never
>> > has, and never will end a text line with '\r'.

>> Unix does not, never has, and never will end a text line with ' ' (a
>> space character) or with \t (a tab character). Yet if I begin a
>> shell script with '#!/bin/sh ' or '#!/bin/sh\t', the training white
>> space is striped and /bin/sh gets exec'd. Since \r has no special
>> significance to Unix, I'd expect it to be treated the same as any
>> other whitespace character -- it should be striped, and /bin/sh
>> should get exec'd.

>Makes sense, IMHO...

That only makes sense if:
#!/bin/shasdf\n
would also exec /bin/sh.

" " and \t are whitespace, \r is not whitespace.
--
Jeff McAdams Email: [email protected]
Head Network Administrator Voice: (502) 966-3848
IgLou Internet Services (800) 436-4456

2001-03-05 16:19:13

by Paul Flinders

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Jeff Mcadams wrote:

> Also sprach Rik van Riel
> >On Mon, 5 Mar 2001, John Kodis wrote:
> >> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
> >> > Somebody must have missed the boat entirely. Unix does not, never
> >> > has, and never will end a text line with '\r'.
>
> >> Unix does not, never has, and never will end a text line with ' ' (a
> >> space character) or with \t (a tab character). Yet if I begin a
> >> shell script with '#!/bin/sh ' or '#!/bin/sh\t', the training white
> >> space is striped and /bin/sh gets exec'd. Since \r has no special
> >> significance to Unix, I'd expect it to be treated the same as any
> >> other whitespace character -- it should be striped, and /bin/sh
> >> should get exec'd.
>
> >Makes sense, IMHO...
>
> That only makes sense if:
> #!/bin/shasdf\n
> would also exec /bin/sh.

POSIX disagrees with you (accd to the manual page)

$ man isspace
....
isspace()
checks for white-space characters. In the "C" and
"POSIX" locales, these are: space, form-feed
('\f'), newline ('\n'), carriage return ('\r'),
horizontal tab ('\t'), and vertical tab ('\v').


2001-03-05 16:39:19

by Erik Hensema

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
> On 5 Mar 2001, Jan Nieuwenhuizen wrote:

> > Pavel Machek <[email protected]> writes:

> > > > $ head -1 testscript
> > > > #!/bin/sh
> > > > $ ./testscript bash: ./testscript: No such file or directory

> > > What kernel wants to say is "/usr/bin/perl\r: no such
> > > file". Saying ENOEXEC would be even more confusing.

> > So, why don't we make bash say that, then? As I guess that we've
> > all been bitten by this before.

> > What are the chances for something like this to be included?

> > Greetings, Jan.

> [SNIPPED...]

> So why would you even consider breaking bash as a work-around for a
> broken script?

Userfriendlyness.

> Somebody must have missed the boat entirely. Unix does not, never
> has, and never will end a text line with '\r'. It's Microsoft junk
> that does that, a throwback to CP/M, a throwback to MDS/200.

Yes, _we_ all know that. However, it's not really intuitive to the user
getting a 'No such file or directory' on a script he just created. Bash
doesn't say:
bash: testscript: Script interpreter not found
but bash says:
bash: testscript: No such file or directory

Maybe we should create a new errno: EINTERPRETER or something like that and
let the kernel return that instead of ENOENT.

Note that this has little to do with the \r\n problem but only with the
_real_ underlying reason: the script interpreter is not found and ENOENT is
returned confusing the user: the user thinks the _script_ is not found,
while its there, for sure.
--
Erik Hensema ([email protected])

2001-03-05 16:53:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Followup to: <[email protected]>
By author: "Richard B. Johnson" <[email protected]>
In newsgroup: linux.dev.kernel
>
> The '\r' (^R) definitely has special significance to Unix. It's called
> "VREPRINT", in the termios structure member "c_cc".
>

'\r' is ^M, not ^R.

> There is really no such thing as "whitespace" in Unix compatible text.
> For instance, the text in a Makefile MUST use the tab character as a
> separator. Spaces won't do.

Whitespace is defined by POSIX as '\n', '\r', '\t', '\v', '\f' or ' '.
Occationally, the specific *kind* of whitespace matters -- for
example, '\n' frequently have different behaviour; as does '\t' in
make.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2001-03-05 17:12:34

by Andreas Schwab

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Paul Flinders <[email protected]> writes:

|> Jeff Mcadams wrote:
|>
|> > Also sprach Rik van Riel
|> > >On Mon, 5 Mar 2001, John Kodis wrote:
|> > >> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
|> > >> > Somebody must have missed the boat entirely. Unix does not, never
|> > >> > has, and never will end a text line with '\r'.
|> >
|> > >> Unix does not, never has, and never will end a text line with ' ' (a
|> > >> space character) or with \t (a tab character). Yet if I begin a
|> > >> shell script with '#!/bin/sh ' or '#!/bin/sh\t', the training white
|> > >> space is striped and /bin/sh gets exec'd. Since \r has no special
|> > >> significance to Unix, I'd expect it to be treated the same as any
|> > >> other whitespace character -- it should be striped, and /bin/sh
|> > >> should get exec'd.
|> >
|> > >Makes sense, IMHO...
|> >
|> > That only makes sense if:
|> > #!/bin/shasdf\n
|> > would also exec /bin/sh.
|>
|> POSIX disagrees with you (accd to the manual page)
|>
|> $ man isspace

This has no significance here. The right thing to look at is $IFS, which
does not contain \r by default. The shell only splits words by "IFS
whitespace", and the kernel should be consistent with it:

$ echo -e 'ls foo\r' | sh
ls: foo: No such file or directory

Andreas.

--
Andreas Schwab "And now for something
SuSE Labs completely different."
[email protected]
SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5

2001-03-05 19:03:56

by Pozsar Balazs

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, Paul Flinders wrote:

> Jeff Mcadams wrote:
>
> > Also sprach Rik van Riel
> > >On Mon, 5 Mar 2001, John Kodis wrote:
> > >> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
> > >> > Somebody must have missed the boat entirely. Unix does not, never
> > >> > has, and never will end a text line with '\r'.
> >
> > >> Unix does not, never has, and never will end a text line with ' ' (a
> > >> space character) or with \t (a tab character). Yet if I begin a
> > >> shell script with '#!/bin/sh ' or '#!/bin/sh\t', the training white
> > >> space is striped and /bin/sh gets exec'd. Since \r has no special
> > >> significance to Unix, I'd expect it to be treated the same as any
> > >> other whitespace character -- it should be striped, and /bin/sh
> > >> should get exec'd.
> >
> > >Makes sense, IMHO...
> >
> > That only makes sense if:
> > #!/bin/shasdf\n
> > would also exec /bin/sh.
>
> POSIX disagrees with you (accd to the manual page)
>
> $ man isspace
> ....
> isspace()
> checks for white-space characters. In the "C" and
> "POSIX" locales, these are: space, form-feed
> ('\f'), newline ('\n'), carriage return ('\r'),
> horizontal tab ('\t'), and vertical tab ('\v').

And what does POSIX say about "#!/bin/sh\r" ?
In other words: should the kernel look for the interpreter between the !
and the newline, or [the first space or newline] or the first whitespace?

IMHO, the first whitespace. Which means that "#!/bin/sh\r" should invoke
/bin/sh. (though it is junk).

--

2001-03-05 19:15:20

by Jesse Pollard

[permalink] [raw]
Subject: Re: binfmt_script and ^M

John Kodis <[email protected]>:
> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
>
> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'.
>
> Unix does not, never has, and never will end a text line with ' ' (a
> space character) or with \t (a tab character). Yet if I begin a shell
> script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
> striped and /bin/sh gets exec'd. Since \r has no special significance
> to Unix, I'd expect it to be treated the same as any other whitespace
> character -- it should be striped, and /bin/sh should get exec'd.

Actually it does have some significance - it causes a return, then the
following text overwrites the current text. Granted, this is only used
occasionally for generating bold/underline/...

This is used in some formatters (troff) occasionally, though it tends to
use backspace now.

\r is not considered whitespace, though it should be possible to define
it that way. A line terminator is always \n.

Another point, is that the "#!/bin/sh" can have options added: it can be
"#!/bin/sh -vx" and the option -vx is passed to the shell. The space is
not just "stripped". It is used as a parameter separator. As such, the
"stripping" is only because the first parameter is separated from the
command by whitespace.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-05 19:49:56

by Paul Flinders

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Andreas Schwab wrote:

> This [isspace('\r') == 1] has no significance here. The right thing to

> look at is $IFS, which does not contain \r by default. The shell only splits

> words by "IFS whitespace", and the kernel should be consistent with it:
>
> $ echo -e 'ls foo\r' | sh
> ls: foo: No such file or directory

The problem with that argument is that #!<interpreter> can be applied
to more than just shells which understand $IFS, so which environment
variable does the kernel pick?

It's a difficult one - logically white space should terminate the interpreter
name but the definition of what is, or isn't, white space is quite definately
a user space issue. Unfortunately if you do use the user's locale to decide
you then open the possibility that whether a scipt works or not depends
on the locale - and that, surely, is equally unacceptable and deferring
to a user-space "script launcher" app is going to open yet more problems.

In the end I suspect that the only practical way out _is_ to say that the kernel
uses space (0x20) and tab (0x8) as white space and no other character.

This does miss the point though - whatever the rules are used to parse
the interpreter name it is still confusing when the error reported by the
shell is "No such file or directory". Especially when the file is sitting
in front of you. Would it be so bad to add an ENOINTERP error?



2001-03-05 20:08:43

by Paul Flinders

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Paul Flinders wrote:

> uses space (0x20) and tab (0x8) as white space and no other character.
>

<sigh> I mean, of course, tab (_0x9_)

I just checked - the kernel isspace() macro says that \r is whitespace.


2001-03-05 20:19:13

by Jan Nieuwenhuizen

[permalink] [raw]
Subject: [PATCH #3]: print missing interpreter name [Was: Re: binfmt_script and ^M]

"Richard B. Johnson" <[email protected]> writes:

> No. I did not miss the point. The 'No such file or directory' error
> (when you can see the ^$^$)#@@*& filename with 'ls'), usually means
> that there is something wrong with the file.

Now, let's see. When this error happens, it can be one of these:

1a) the script itself is broken (eg, it has `#!/bin/bash\r')
1b) the interpreter is missing, (eg, the script has
`#!/usr/bin/perl', while perl got removed during the
grand perl repackaging)
2) the interpreter is broken, (eg, missing shared library)
3) the script does not exist

In any case, the heart of the problems 1) and 2) is expressed by the
fact that there's something fishy with #!INTERPRETER. In the case of
1a, the script should be fixed, in case 1b, the system should be
fixed, but this doesn't really matter. Either way: INTERPRETER is the
file the kernel is talking about when it says ENOENT. Only in case 3),
the ENOENT is about the script itself.

It seems reasonable, and helpful, for bash to print the name of the
INTERPRETER.

> Usually, I have found that it was an executable linked against some
> other runtime library than what I have. `strace` finds this quickly.

Well, this doesn't mean very much, other than you happen to experience
2) most (and you're fast at identifying the strange error). But even
in this case, the name of the broken interpreter is what you should be
warned about. Maybe others usually experience 1b), who cares?

In this third incarnation of my patch (today really isn't my day), you
see these messages for 1-3:

1)
$ cat /bin/foo.sh
#!/foo/bar/baz
echo bar

$ /bin/bash -c /bin/foo.sh
/bin/bash: /bin/foo.sh: No such file or directory

$ ./new/bash -c /bin/foo.sh
./new/bash: /bin/foo.sh: /foo/bar/baz: No such file or directory

2)
$ cat /bin/no-ld.sh
#!/usr/bin/urg
$ ldd /bin/urg
libkpathsea.so.3 => not found
[..]

$ /bin/bash -c /bin/no-ld.sh
/bin/bash: /bin/no-ld.sh: No such file or directory

$ ./new/bash -c /bin/no-ld.sh
./new/bash: /bin/no-ld.sh: /usr/bin/urg: No such file or directory

3)
$ /bin/bash -c /bin/no-such-foo.sh
/bin/bash: /bin/no-such-foo.sh: No such file or directory

$ ./new/bash -c /bin/no-such-foo.sh
./new/bash: /bin/no-such-foo.sh: No such file or directory


Ok, here goes try #3:

--- ../bash-2.04.orig/ChangeLog Mon Mar 5 21:06:11 2001
+++ ./ChangeLog Mon Mar 5 19:22:46 2001
@@ -0,0 +1,5 @@
+2001-03-05 Jan Nieuwenhuizen <[email protected]>
+
+ * execute_cmd.c (extract_hash_bang_interpreter): New function.
+ (shell_execve): More informative error message.
+
--- ../bash-2.04.orig/execute_cmd.c Tue Jan 25 17:29:11 2000
+++ ./execute_cmd.c Mon Mar 5 20:35:46 2001
@@ -3035,6 +3035,42 @@
}
}

+/* Look for #!INTERPRETER in file COMMAND, and return INTERPRETER . */
+static char *
+extract_hash_bang_interpreter (char *command, char buf[80])
+{
+ int fd;
+ char *interpreter;
+
+ interpreter = "";
+ fd = open (command, O_RDONLY);
+ if (fd >= 0)
+ {
+ int len;
+
+ len = read (fd, (char *)buf, 80);
+ close (fd);
+
+ if (len > 0
+ && buf[0] == '#' && buf[1] == '!')
+ {
+ int i;
+ int start;
+
+ for (i = 2; whitespace (buf[i]) && i < len; i++)
+ ;
+
+ for (start = i;
+ !whitespace (buf[i]) && buf[i] != '\n' && i < len;
+ i++)
+ ;
+
+ interpreter = substring ((char *)buf, start, i);
+ }
+ }
+ return interpreter;
+}
+
/* Execute a simple command that is hopefully defined in a disk file
somewhere.

@@ -3326,7 +3362,17 @@
else
{
errno = i;
- file_error (command);
+ if (errno == ENOENT)
+ {
+ char buf[80];
+ char *interpreter = extract_hash_bang_interpreter (command, buf);
+ if (strlen (interpreter))
+ sys_error ("%s: %s", command, interpreter);
+ else
+ file_error (command);
+ }
+ else
+ file_error (command);
}
return ((i == ENOENT) ? EX_NOTFOUND : EX_NOEXEC); /* XXX Posix.2 says that exit status is 126 */
}


--
Jan Nieuwenhuizen <[email protected]> | GNU LilyPond - The music typesetter
http://www.xs4all.nl/~jantien | http://www.lilypond.org

2001-03-05 20:40:07

by robert read

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, Mar 05, 2001 at 07:58:52PM +0100, Pozsar Balazs wrote:
>
> And what does POSIX say about "#!/bin/sh\r" ?
> In other words: should the kernel look for the interpreter between the !
> and the newline, or [the first space or newline] or the first whitespace?
>
> IMHO, the first whitespace. Which means that "#!/bin/sh\r" should invoke
> /bin/sh. (though it is junk).
>

The line terminator, '\n', is what terminates the interpreter. White
space (in this case, only ' ' and '\t') is used to seperate the
arguments to the interpreter. This allows scripts to pass args to
intepreters, as in #!/usr/bin/per -w or #!/usr/bin/env perl -w

So is '\r' a line terminator? For Linux, no. Should '\r' seperate
arguments? No, that would be very strange.

robert

2001-03-05 21:07:13

by Pozsar Balazs

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, Robert Read wrote:
> On Mon, Mar 05, 2001 at 07:58:52PM +0100, Pozsar Balazs wrote:
> >
> > And what does POSIX say about "#!/bin/sh\r" ?
> > In other words: should the kernel look for the interpreter between the !
> > and the newline, or [the first space or newline] or the first whitespace?
> >
> > IMHO, the first whitespace. Which means that "#!/bin/sh\r" should invoke
> > /bin/sh. (though it is junk).
>
> The line terminator, '\n', is what terminates the interpreter. White
> space (in this case, only ' ' and '\t') is used to seperate the
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> arguments to the interpreter.


The last little tiny thing that bothers me: why? Why only ' ' and '\t' _in
this case_? As someone mentioned, even isspace() returns whitespace.

A possible answer (that i can think of), is that those ar the whitespaces,
which are in IFS (as said previously), taking out us from kernel-space
into userspace. But imho we shouldn't define another set whitespace for
this case, can't we just use what isspace() says?

(okay, I'm not for this '\r' thingy, I just want to see the reasons.)

--
Balazs Pozsar.

2001-03-05 21:16:55

by Andries E. Brouwer

[permalink] [raw]
Subject: Re: binfmt_script and ^M

> And what does POSIX say about "#!/bin/sh\r" ?

Nothing at all. The #! construction is not part of any standard
right now. The implementation is messy - different operating systems
do vaguely similar things, but all details differ.
Linux can do whatever it wants.
Of course it helps portability if we stay close to what other OSs do.

There is some discussion at
http://www.cwi.nl/~aeb/std/hashexclam-1.html
Additions and corrections welcome.

In this particular case I have no strong opinion,
but would not object to removing the '\r'.

The standard defines whitespace in the POSIX locale, as one or more
<blank>s (<space>s and <tab>s), <newline>s, <carriage-return>s,
<form-feed>s, and <vertical-tab>s.
Some systems strip the #! line for trailing whitespace, some don't.

Andries

2001-03-05 21:49:55

by Dr. Kelsey Hudson

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, John Kodis wrote:

> On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
>
> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'.
>
> Unix does not, never has, and never will end a text line with ' ' (a
> space character) or with \t (a tab character). Yet if I begin a shell
> script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
> striped and /bin/sh gets exec'd. Since \r has no special significance
> to Unix, I'd expect it to be treated the same as any other whitespace
> character -- it should be striped, and /bin/sh should get exec'd.

umm, last i checked a carriage return wasn't whitespace...
space, horizontal tab, vertical tab, form feed constitute whitespace
IIRC...

Kelsey Hudson [email protected]
Software Engineer
Compendium Technologies, Inc (619) 725-0771
---------------------------------------------------------------------------

2001-03-05 22:35:29

by robert read

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, Mar 05, 2001 at 10:05:36PM +0100, Pozsar Balazs wrote:
> On Mon, 5 Mar 2001, Robert Read wrote:
> > On Mon, Mar 05, 2001 at 07:58:52PM +0100, Pozsar Balazs wrote:
> > >
> > > And what does POSIX say about "#!/bin/sh\r" ?
> > > In other words: should the kernel look for the interpreter between the !
> > > and the newline, or [the first space or newline] or the first whitespace?
> > >
> > > IMHO, the first whitespace. Which means that "#!/bin/sh\r" should invoke
> > > /bin/sh. (though it is junk).
> >
> > The line terminator, '\n', is what terminates the interpreter. White
> > space (in this case, only ' ' and '\t') is used to seperate the
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > arguments to the interpreter.
>
>
> The last little tiny thing that bothers me: why? Why only ' ' and '\t' _in
> this case_? As someone mentioned, even isspace() returns whitespace.
>
> A possible answer (that i can think of), is that those ar the whitespaces,
> which are in IFS (as said previously), taking out us from kernel-space
> into userspace. But imho we shouldn't define another set whitespace for
> this case, can't we just use what isspace() says?

And isspace('\n') is also true. At question here is not the
definition of whitespace. The question is, what is the definition of
a command line? What characters are valid command line seperators?

robert

2001-03-06 02:19:10

by Richard B. Johnson

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, Robert Read wrote:

> On Mon, Mar 05, 2001 at 07:58:52PM +0100, Pozsar Balazs wrote:
> >
> > And what does POSIX say about "#!/bin/sh\r" ?
> > In other words: should the kernel look for the interpreter between the !
> > and the newline, or [the first space or newline] or the first whitespace?
> >
> > IMHO, the first whitespace. Which means that "#!/bin/sh\r" should invoke
> > /bin/sh. (though it is junk).
> >
>
> The line terminator, '\n', is what terminates the interpreter. White
> space (in this case, only ' ' and '\t') is used to seperate the
> arguments to the interpreter. This allows scripts to pass args to
> intepreters, as in #!/usr/bin/per -w or #!/usr/bin/env perl -w
>
> So is '\r' a line terminator? For Linux, no. Should '\r' seperate
> arguments? No, that would be very strange.
>

For research, I suggest a look at getopt(3) or whatever it's called.
The command line args are seperated into chunks based upon what got
seperated into argv[1]....[n], delimited by (hold my breath) white-space.

Of course, it's not getopt(), but "makeopt()" that we are looking for.
...Whatever chopped up those command-line arguments in the first place.

If __that__ corresponds so POSIX rules, then whatever follows must
also comply.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (799.53 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


2001-03-06 10:41:26

by Andreas Schwab

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Paul Flinders <[email protected]> writes:

|> Andreas Schwab wrote:
|>
|> > This [isspace('\r') == 1] has no significance here. The right thing to
|>
|> > look at is $IFS, which does not contain \r by default. The shell only splits
|>
|> > words by "IFS whitespace", and the kernel should be consistent with it:
|> >
|> > $ echo -e 'ls foo\r' | sh
|> > ls: foo: No such file or directory
|>
|> The problem with that argument is that #!<interpreter> can be applied
|> to more than just shells which understand $IFS, so which environment
|> variable does the kernel pick?

The kernel should use the same default value of IFS as the Bourne shell,
ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
independent of any settings in the environment.

|> It's a difficult one - logically white space should terminate the interpreter

No, IFS-whitespace delimits arguments in the Bourne shell.

Andreas.

--
Andreas Schwab "And now for something
SuSE Labs completely different."
[email protected]
SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5

2001-03-06 10:51:27

by Pavel Machek

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Hi!

> > Somebody must have missed the boat entirely. Unix does not, never
> > has, and never will end a text line with '\r'. It's Microsoft junk
> > that does that, a throwback to CP/M, a throwback to MDS/200.
>
> Yes, _we_ all know that. However, it's not really intuitive to the user
> getting a 'No such file or directory' on a script he just created. Bash
> doesn't say:
> bash: testscript: Script interpreter not found
> but bash says:
> bash: testscript: No such file or directory
>
> Maybe we should create a new errno: EINTERPRETER or something like that and
> let the kernel return that instead of ENOENT.

Agreen, EINTEPRETTER would be very nice, plus maybe EDYNLINKER. We
already have 'level 3 stopped', so this should not hurt :-)).
Pavel

--
I'm [email protected]. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at [email protected]

2001-03-06 12:34:25

by Paul Flinders

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Andreas Schwab wrote:

> Paul Flinders <[email protected]> writes:
>
> |> Andreas Schwab wrote:
> |>
> |> > This [isspace('\r') == 1] has no significance here. The right thing to
> |>
> |> > look at is $IFS, which does not contain \r by default. The shell only splits
> |>
> |> > words by "IFS whitespace", and the kernel should be consistent with it:
> |> >
> |> > $ echo -e 'ls foo\r' | sh
> |> > ls: foo: No such file or directory
> |>
> |> The problem with that argument is that #!<interpreter> can be applied
> |> to more than just shells which understand $IFS, so which environment
> |> variable does the kernel pick?
>
> The kernel should use the same default value of IFS as the Bourne shell,
> ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> independent of any settings in the environment.
>
> |> It's a difficult one - logically white space should terminate the interpreter
>
> No, IFS-whitespace delimits arguments in the Bourne shell.

Way back whenever processing #! was moved from the
shell to the kernel** this argument would have made sense -
today I'm not so sure.

But I'm quite happy for the kernel to use just space and
tab if it wishes, or anything else for that matter but it _is_
confusing that the error code doesn't distinguish problems
with the script from problems with the interpreter.

**Did linux ever rely on the shell for this?

2001-03-06 13:56:12

by Jesse Pollard

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Andreas Schwab <[email protected]>:Andreas Schwab <[email protected]>Andreas Schwab <[email protected]>
> Paul Flinders <[email protected]> writes:
>
> |> Andreas Schwab wrote:
> |>
> |> > This [isspace('\r') == 1] has no significance here. The right thing to
> |>
> |> > look at is $IFS, which does not contain \r by default. The shell only splits
> |>
> |> > words by "IFS whitespace", and the kernel should be consistent with it:
> |> >
> |> > $ echo -e 'ls foo\r' | sh
> |> > ls: foo: No such file or directory
> |>
> |> The problem with that argument is that #!<interpreter> can be applied
> |> to more than just shells which understand $IFS, so which environment
> |> variable does the kernel pick?
>
> The kernel should use the same default value of IFS as the Bourne shell,
> ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> independent of any settings in the environment.
>
> |> It's a difficult one - logically white space should terminate the interpreter
>
> No, IFS-whitespace delimits arguments in the Bourne shell.

IFS can be defined in the environment.

The kernel cannot use that definition because it introduces buffer limits
and a potential overflow. Besides, the kernel can run scripts from
applications that may not have or pass IFS, or it's equivalent in whatever
shell is being used (I seem to remember an Icon shell that used commas).

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-06 14:39:43

by Laramie Leavitt

[permalink] [raw]
Subject: RE: binfmt_script and ^M

> Andreas Schwab wrote:
> > Paul Flinders <[email protected]> writes:
> > |> Andreas Schwab wrote:
> > |>
> > |> > This [isspace('\r') == 1] has no significance here. The
> right thing to
> > |>
> > |> > look at is $IFS, which does not contain \r by default.
> The shell only splits
> > |>
> > |> > words by "IFS whitespace", and the kernel should be
> consistent with it:
> > |> >
> > |> > $ echo -e 'ls foo\r' | sh
> > |> > ls: foo: No such file or directory
> > |>
> > |> The problem with that argument is that #!<interpreter> can be applied
> > |> to more than just shells which understand $IFS, so which environment
> > |> variable does the kernel pick?
> >
> > The kernel should use the same default value of IFS as the Bourne shell,
> > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> > independent of any settings in the environment.
> >
> > |> It's a difficult one - logically white space should
> terminate the interpreter
> >
> > No, IFS-whitespace delimits arguments in the Bourne shell.
>
> Way back whenever processing #! was moved from the
> shell to the kernel** this argument would have made sense -
> today I'm not so sure.
>
> But I'm quite happy for the kernel to use just space and
> tab if it wishes, or anything else for that matter but it _is_
> confusing that the error code doesn't distinguish problems
> with the script from problems with the interpreter.
>
> **Did linux ever rely on the shell for this?

Maybe the correct answer would be to create a proc entry for this.
That allow the user to decide what is whitespace on his machine,
since nobody here appears to agree.

User: hmm... Wonder what happes if i do the following
%cat '$#! \n\t\r' > /proc/whitespace
later, % config.sh : Error file not found.
Oops, bug report... ;-)

Laramie

2001-03-06 14:55:15

by Andreas Schwab

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Jesse Pollard <[email protected]> writes:

|> Andreas Schwab <[email protected]>:Andreas Schwab <[email protected]>Andreas Schwab <[email protected]>
|> > Paul Flinders <[email protected]> writes:
|> >
|> > |> Andreas Schwab wrote:
|> > |>
|> > |> > This [isspace('\r') == 1] has no significance here. The right thing to
|> > |>
|> > |> > look at is $IFS, which does not contain \r by default. The shell only splits
|> > |>
|> > |> > words by "IFS whitespace", and the kernel should be consistent with it:
|> > |> >
|> > |> > $ echo -e 'ls foo\r' | sh
|> > |> > ls: foo: No such file or directory
|> > |>
|> > |> The problem with that argument is that #!<interpreter> can be applied
|> > |> to more than just shells which understand $IFS, so which environment
|> > |> variable does the kernel pick?
|> >
|> > The kernel should use the same default value of IFS as the Bourne shell,
|> > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
|> > independent of any settings in the environment.
|> >
|> > |> It's a difficult one - logically white space should terminate the interpreter
|> >
|> > No, IFS-whitespace delimits arguments in the Bourne shell.
|>
|> IFS can be defined in the environment.

No, the shell won't import it.

Andreas.

--
Andreas Schwab "And now for something
SuSE Labs completely different."
[email protected]
SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5

2001-03-06 15:06:49

by Jeff Coy

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Mon, 5 Mar 2001, Robert Read wrote:

> And isspace('\n') is also true. At question here is not the
> definition of whitespace. The question is, what is the definition of
> a command line? What characters are valid command line seperators?
>

It doesn't seem likely that '\r' is going to be accepted into the general
kernel. I personally don't think the issue affects enough *nix systems
for this to be a big issue.

I used to work at an ISP where I maintained a Slowaris box with about 600
websites on it; this issue came up frequently with customers uploading
scripts in binary mode trying to run #!/usr/bin/perl^M. The solution for
me was to just do the following:

cd /usr/bin
sudo ln -s perl^V^M perl

and it effectively solved the issue. I havn't looked at slowaris 8, but
slowaris 7 still refuses to run #!/usr/bin/perl^M. Why not just use a
simple one-time solution that will solve the problem & is portable to
other OS's?

Jeff
--
Resisting temptation is easier when you think you'll probably get
another chance later on.

2001-03-06 15:17:30

by Sean Hunter

[permalink] [raw]
Subject: Re: binfmt_script and ^M


I propose
/proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude

Any support?

Sean

On Tue, Mar 06, 2001 at 02:45:51PM -0000, Laramie Leavitt wrote:
> > Andreas Schwab wrote:
> > > Paul Flinders <[email protected]> writes:
> > > |> Andreas Schwab wrote:
> > > |>
> > > |> > This [isspace('\r') == 1] has no significance here. The
> > right thing to
> > > |>
> > > |> > look at is $IFS, which does not contain \r by default.
> > The shell only splits
> > > |>
> > > |> > words by "IFS whitespace", and the kernel should be
> > consistent with it:
> > > |> >
> > > |> > $ echo -e 'ls foo\r' | sh
> > > |> > ls: foo: No such file or directory
> > > |>
> > > |> The problem with that argument is that #!<interpreter> can be applied
> > > |> to more than just shells which understand $IFS, so which environment
> > > |> variable does the kernel pick?
> > >
> > > The kernel should use the same default value of IFS as the Bourne shell,
> > > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> > > independent of any settings in the environment.
> > >
> > > |> It's a difficult one - logically white space should
> > terminate the interpreter
> > >
> > > No, IFS-whitespace delimits arguments in the Bourne shell.
> >
> > Way back whenever processing #! was moved from the
> > shell to the kernel** this argument would have made sense -
> > today I'm not so sure.
> >
> > But I'm quite happy for the kernel to use just space and
> > tab if it wishes, or anything else for that matter but it _is_
> > confusing that the error code doesn't distinguish problems
> > with the script from problems with the interpreter.
> >
> > **Did linux ever rely on the shell for this?
>
> Maybe the correct answer would be to create a proc entry for this.
> That allow the user to decide what is whitespace on his machine,
> since nobody here appears to agree.
>
> User: hmm... Wonder what happes if i do the following
> %cat '$#! \n\t\r' > /proc/whitespace
> later, % config.sh : Error file not found.
> Oops, bug report... ;-)
>
> Laramie
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-03-06 15:37:59

by David Weinehall

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, Mar 06, 2001 at 03:12:42PM +0000, Sean Hunter wrote:
>
> I propose
> /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude
>
> Any support?

<sarcasm>
Hey, let's go even further! Let's add support in all programs for \r\n.
And why not make all program use filenames that have an 8+3 char garbled
equivalent where the last 3 are the indicators of the filetype. Oh, and
let's do everything to make sure the user doesn't leave Gnome/KDE.
And of course, let's add new features to all existing protocols and
other standards to make them "superior" to other implementations.
Oh, and of course, we must require an extra 64 MB of memory and
500 MB of diskspace for each release, and a 200MHz faster processor.
And let us do all system settings through a registry.

OH! Let's change the name of the operating-system to something more
catchy. Hmmm. Let's see. Windows maybe...
</sarcasm>


/David
_ _
// David Weinehall <[email protected]> /> Northern lights wander \\
// Project MCA Linux hacker // Dance across the winter sky //
\> http://www.acc.umu.se/~tao/ </ Full colour fire </

2001-03-06 15:56:30

by Jesse Pollard

[permalink] [raw]
Subject: Re: binfmt_script and ^M

--------- Received message begins Here ---------

>
> Jesse Pollard <[email protected]> writes:
>
> |> Andreas Schwab <[email protected]>:Andreas Schwab <[email protected]>Andreas Schwab <[email protected]>
> |> > Paul Flinders <[email protected]> writes:
> |> >
> |> > |> Andreas Schwab wrote:
> |> > |>
[snip]
> |> > |> It's a difficult one - logically white space should terminate the interpreter
> |> >
> |> > No, IFS-whitespace delimits arguments in the Bourne shell.
> |>
> |> IFS can be defined in the environment.
>
> No, the shell won't import it.

I wasn't directly referring to the bourn shell or bash. The "shell" is whatever
program is specified on the "#!<shellprog>". The arbitrary <shellprog> could
import it, depending on the definition. (IFS=....; export IFS). If <shellprog>
imports it then it could be used, but only after the <shellprog> is running.

By default, IFS is a non-exported environment variable. It can be exported
to other programs if desired, those other programs can be used in
"#!<shellprog>" constructs.

And some systems do import IFS. IRIX bourn shell will import it:

tomcat 54% sh
$ echo "..${IFS}.."
..
..
tomcat 54% sh
$ echo "..${IFS}.." --(default: space, tab and \n")
..
..
$ IFS=" ^M" --(space, tab and \r)
$ export IFS
$ echo "..${IFS}.."
..
$ sh -- new subshell
$ echo "..${IFS}.."
.. -- space, tab and \r.
$

The same test on bash shows that it will not import it:

bash-2.04$ bash
bash-2.04$ echo "..${IFS}.."
..
..
bash-2.04$ IFS=" ^M"
bash-2.04$ export IFS
bash-2.04$ echo "..${IFS}.."
..
bash-2.04$ bash
bash-2.04$ echo "..${IFS}.."
..
..
bash-2.04$

The same test done on ash shows that it will import it:

bash-2.04$ ash
$ echo "..${IFS}.."
..
..
$ IFS=" ^M"
$ export IFS
$ echo "..${IFS}.."
..
$ ash
$ echo "..${IFS}.."
..
$

The csh shell, on the other hand, doesn't use IFS. It always uses blank or tab
unless they are escaped with \ or are enclosed in quotes.

Personally, I wouldn't want to change the kernel. There is no good way to
determine which error should be given: the script doesn't exist, or the
shell doesn't exist. Either may be nonexistant, and there is only two
possibilities. First do a "ls -l" on the script (must be readable as well
as executable, second do a "ls -l 'line' where line is the first line
of the shell script, minus the "#!". Sometimes it takes a vi/emacs/...
session to look for any funny characters.

The first case is that the shell script doesn't exist. This is reported
by the users command interpreter. The second case is reported by the
kernel. If all that is wanted is to change the format of the message, then
that should be doable - it has to report a different error than "No such
file..." which is the standard error status for this error. Just because the
file that isn't found is the shell program is no reason to change the status -
it really IS "No such file...". There will be some programs that depend
on this status return (menu/window managers come to mind) to issue an
appropriate status.

If the error is "Permission denied", then the equivalent situation exists.
The difference is that the "ls" alone is enough to determine why. (permission
may be denied for the shell program as well as the script).

A case can be made that the shell programs (bash/ash/csh...) do not do a
complete analysis of the exit status. All they appear to do is a "perror".

If the command does exist, then assume it is the shell program that is
missing?? I would implement this by doing a "stat" on the command path (if
it doesn't exist/permission denied/whatever - issue message about the command
path), then do the exec, followed by the return status analysis. Of course
this isn't easy when using execl, which is why it isn't done - perhaps a
change to the exec.. library funtions?

This is a user mode issue and not a kernel issue.

-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2001-03-06 15:51:00

by James A Sutherland

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, 6 Mar 2001, Sean Hunter wrote:

>
> I propose
> /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude
>
> Any support?

Hrm - make it part of the "fscking_moron" subsystem.

/proc/sys/kernel/fscking_moron/stupidity_workarounds/cant_handle_ascii/broken_shellscript_hack

It would be nice if the shell's error message gave a bit more information
on the problem, but that's a userspace issue IMHO: make *sh check for this
kind of mistake and say "interpreter for shell script not found",
something like that. Whatever, it's NOT a kernel issue.

Maybe the kernel could return a different error value, though - just
saying "file not found" isn't very helpful...


James.

2001-03-06 17:04:41

by Xavier Bestel

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Wouldn't it be easier to run the script interpreter through WINE ? This
way we could workaround several Win32 peculiarities, and users wouldn't
bother taking special steps when coding on their home PC.

Xav

Le 06 Mar 2001 15:12:42 +0000, Sean Hunter a ?crit :
>
> I propose
> /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude
>
> Any support?
>
> Sean
>
> On Tue, Mar 06, 2001 at 02:45:51PM -0000, Laramie Leavitt wrote:
> > > Andreas Schwab wrote:
> > > > Paul Flinders <[email protected]> writes:
> > > > |> Andreas Schwab wrote:
> > > > |>
> > > > |> > This [isspace('\r') == 1] has no significance here. The
> > > right thing to
> > > > |>
> > > > |> > look at is $IFS, which does not contain \r by default.
> > > The shell only splits
> > > > |>
> > > > |> > words by "IFS whitespace", and the kernel should be
> > > consistent with it:
> > > > |> >
> > > > |> > $ echo -e 'ls foo\r' | sh
> > > > |> > ls: foo: No such file or directory
> > > > |>
> > > > |> The problem with that argument is that #!<interpreter> can be applied
> > > > |> to more than just shells which understand $IFS, so which environment
> > > > |> variable does the kernel pick?
> > > >
> > > > The kernel should use the same default value of IFS as the Bourne shell,
> > > > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> > > > independent of any settings in the environment.
> > > >
> > > > |> It's a difficult one - logically white space should
> > > terminate the interpreter
> > > >
> > > > No, IFS-whitespace delimits arguments in the Bourne shell.
> > >
> > > Way back whenever processing #! was moved from the
> > > shell to the kernel** this argument would have made sense -
> > > today I'm not so sure.
> > >
> > > But I'm quite happy for the kernel to use just space and
> > > tab if it wishes, or anything else for that matter but it _is_
> > > confusing that the error code doesn't distinguish problems
> > > with the script from problems with the interpreter.
> > >
> > > **Did linux ever rely on the shell for this?
> >
> > Maybe the correct answer would be to create a proc entry for this.
> > That allow the user to decide what is whitespace on his machine,
> > since nobody here appears to agree.
> >
> > User: hmm... Wonder what happes if i do the following
> > %cat '$#! \n\t\r' > /proc/whitespace
> > later, % config.sh : Error file not found.
> > Oops, bug report... ;-)
> >
> > Laramie
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-03-06 18:15:33

by Peter Samuelson

[permalink] [raw]
Subject: Re: binfmt_script and ^M


[Jeff Coy]
> this issue came up frequently with customers uploading scripts in
> binary mode trying to run #!/usr/bin/perl^M. The solution for me was
> to just do the following:
>
> cd /usr/bin
> sudo ln -s perl^V^M perl

So none of your customers tried '#!/usr/bin/perl -w^M'? (Come on,
doesn't everyone use -w?)

I'm not for treating \r as IFS in the kernel, but the "simple one-time"
solution is not perfect..

Peter

2001-03-06 18:18:03

by Don Dugger

[permalink] [raw]
Subject: Re: binfmt_script and ^M (historical note)

Paul-

Minor historical note. The `#!' processing was never done by the
shell, this was always done in the kernel. Think about about it,
the `#' character denotes a comment line, the shell ignores that
line. `#!' was used to create a way for the kernel to execute
a shell script directly. Since the kernel recognized the type of
executable based on a 16-bit magic number `#!' became a new magic
number that meant "break the remainder of the line into `program'
and `args' and then execute `program' with `args'".

A nice side effect of this is that it became a way to create shell
scripts that worked no matter what shell a user was running. For
efficiency, most shells just read and execute a shell script so
releasing a Bourne shell script to a group of CSH users created
problems. At the expense of a `fork' and `exec' the `#!' magic
number solved this problem.

On Tue, Mar 06, 2001 at 12:33:49PM +0000, Paul Flinders wrote:
> Andreas Schwab wrote:
>
> > Paul Flinders <[email protected]> writes:
> >
> > |> Andreas Schwab wrote:
> > |>
> > |> > This [isspace('\r') == 1] has no significance here. The right thing to
> > |>
> > |> > look at is $IFS, which does not contain \r by default. The shell only splits
> > |>
> > |> > words by "IFS whitespace", and the kernel should be consistent with it:
> > |> >
> > |> > $ echo -e 'ls foo\r' | sh
> > |> > ls: foo: No such file or directory
> > |>
> > |> The problem with that argument is that #!<interpreter> can be applied
> > |> to more than just shells which understand $IFS, so which environment
> > |> variable does the kernel pick?
> >
> > The kernel should use the same default value of IFS as the Bourne shell,
> > ie. the same value you'll get with /bin/sh -c 'echo "$IFS"'. This is
> > independent of any settings in the environment.
> >
> > |> It's a difficult one - logically white space should terminate the interpreter
> >
> > No, IFS-whitespace delimits arguments in the Bourne shell.
>
> Way back whenever processing #! was moved from the
> shell to the kernel** this argument would have made sense -
> today I'm not so sure.
>
> But I'm quite happy for the kernel to use just space and
> tab if it wishes, or anything else for that matter but it _is_
> confusing that the error code doesn't distinguish problems
> with the script from problems with the interpreter.
>
> **Did linux ever rely on the shell for this?
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
[email protected]
Ph: 303/938-9838

2001-03-06 18:19:33

by Peter Samuelson

[permalink] [raw]
Subject: Re: binfmt_script and ^M


[Dr. Kelsey Hudson]
> umm, last i checked a carriage return wasn't whitespace... space,
> horizontal tab, vertical tab, form feed constitute whitespace IIRC...

Where and when did you check? Several sources disagree with you.

Peter

2001-03-06 18:26:33

by Jeff Coy

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, 6 Mar 2001, Peter Samuelson wrote:

>
> [Jeff Coy]
> > this issue came up frequently with customers uploading scripts in
> > binary mode trying to run #!/usr/bin/perl^M. The solution for me was
> > to just do the following:
> >
> > cd /usr/bin
> > sudo ln -s perl^V^M perl
>
> So none of your customers tried '#!/usr/bin/perl -w^M'? (Come on,
> doesn't everyone use -w?)
>
> I'm not for treating \r as IFS in the kernel, but the "simple one-time"
> solution is not perfect..
>

'#!/usr/bin/perl -w^M' works without any special handling; the link is
not needed:

11:15:52 jcoy@d-hopper::~
$ cat -vet foo.pl
#!/usr/bin/perl -w^M$
^M$
print "Hello, World!\n";^M$
11:16:52 jcoy@d-hopper::~
$ ./foo.pl
Hello, World!

Jeff
--
The Harvard Law states: Under controlled conditions of light, temperature,
humidity, and nutrition, the organism will do as it damn well pleases.
-- Larry Wall

2001-03-06 20:26:59

by John Kodis

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, Mar 06, 2001 at 11:36:29AM -0700, Jeff Coy wrote:

> '#!/usr/bin/perl -w^M' works without any special handling; the link is
> not needed:

This is the main reason that I think that the kernel should treat \r
as just another whitespace character: it's what most shells do, it's
what most users expect, and it's the least surprising behavior.

--
John Kodis <[email protected]>
Phone: 301-286-7376

2001-03-06 20:43:47

by Andreas Schwab

[permalink] [raw]
Subject: Re: binfmt_script and ^M

John Kodis <[email protected]> writes:

|> On Tue, Mar 06, 2001 at 11:36:29AM -0700, Jeff Coy wrote:
|>
|> > '#!/usr/bin/perl -w^M' works without any special handling; the link is
|> > not needed:
|>
|> This is the main reason that I think that the kernel should treat \r
|> as just another whitespace character: it's what most shells do

Do they? Bourne shells don't, tcsh doesn't, zsh doesn't.

Andreas.

--
Andreas Schwab "And now for something
SuSE Labs completely different."
[email protected]
SuSE GmbH, Schanz?ckerstr. 10, D-90443 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5

2001-03-06 21:00:57

by mirabilos

[permalink] [raw]
Subject: Re: binfmt_script and ^M

----- Original Message -----
From: "Jesse Pollard" <[email protected]>
To: <[email protected]>; "Richard B. Johnson" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Monday, 5. March 2001 19:14
Subject: Re: binfmt_script and ^M


> John Kodis <[email protected]>:
> > On Mon, Mar 05, 2001 at 08:40:22AM -0500, Richard B. Johnson wrote:
> >
> > > Somebody must have missed the boat entirely. Unix does not, never
> > > has, and never will end a text line with '\r'.
> >
> > Unix does not, never has, and never will end a text line with ' ' (a
> > space character) or with \t (a tab character). Yet if I begin a shell
> > script with '#!/bin/sh ' or '#!/bin/sh\t', the training white space is
> > striped and /bin/sh gets exec'd. Since \r has no special significance
> > to Unix, I'd expect it to be treated the same as any other whitespace
> > character -- it should be striped, and /bin/sh should get exec'd.
>
> Actually it does have some significance - it causes a return, then the
> following text overwrites the current text. Granted, this is only used
> occasionally for generating bold/underline/...
>
> This is used in some formatters (troff) occasionally, though it tends to
> use backspace now.

Less supports it, but ^H is quite more oftenly used.
ISO_646.irv:1991 aka ISO-IR-6 aka US-ASCII-7 _also_ defines
it, and we're going to be not ASCII-compatible any longer if we
aren't going to support CRLF line endings.
I also oftenly have the other problem round: LF endings in files which
are to be viewed under DOS. I use a 15-year-old text editor from
Digital Research (yes, DOS 3.41) which still is fine under W** and
DOSEMU, it looks like jstar only that I miss find and replace.
IMHO those problems could be solved with programmes/kernels/libs
accepting LF as line ending and CRLF (and possibly CRCRLF ...)
as a synonyme for LF, but treat CR non-LF differently. I have seen
this behaviour quite often in the past and am using it for myself, too
(except for native assembly progs).

> \r is not considered whitespace, though it should be possible to define
> it that way. A line terminator is always \n.
ACK

> Another point, is that the "#!/bin/sh" can have options added: it can be
> "#!/bin/sh -vx" and the option -vx is passed to the shell. The space is
> not just "stripped". It is used as a parameter separator. As such, the
> "stripping" is only because the first parameter is separated from the
> command by whitespace.

That's why I suggest treating CRLF (and only CR only-LF) as LF.

-mirabilos


2001-03-06 21:05:17

by Dr. Kelsey Hudson

[permalink] [raw]
Subject: Re: binfmt_script and ^M

On Tue, 6 Mar 2001, Peter Samuelson wrote:
> [Dr. Kelsey Hudson]
> > umm, last i checked a carriage return wasn't whitespace... space,
> > horizontal tab, vertical tab, form feed constitute whitespace IIRC...
>
> Where and when did you check? Several sources disagree with you.

a long while ago... i should have checked before i opened my mouth :p

Kelsey Hudson [email protected]
Software Engineer
Compendium Technologies, Inc (619) 725-0771
---------------------------------------------------------------------------

2001-03-06 21:11:07

by mirabilos

[permalink] [raw]
Subject: Re: binfmt_script and ^M

----- Original Message -----
From: "David Weinehall" <[email protected]>
To: "Sean Hunter" <[email protected]>; "Laramie Leavitt" <[email protected]>; <[email protected]>
Sent: Tuesday, 6. March 2001 15:37
Subject: Re: binfmt_script and ^M


> On Tue, Mar 06, 2001 at 03:12:42PM +0000, Sean Hunter wrote:
> >
> > I propose
> > /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude
> >
> > Any support?
>
> <sarcasm>
> Hey, let's go even further! Let's add support in all programs for \r\n.

That is no sarcasm, it is ridiculous. CRLF line endings are ISO-IR-6 and
US-ASCII standard, and even UN*X systems used them when they had printers
(typewriters?) as output device, and no screens. With the Virtual Terminal,
Virtual Console stuff times may have changed but we have so many old stuff
in it... I won't remove them or didn't think of, but I remember you of:
- lost+found
- using ESC (or Alt???) as META for _shell commands_ which
easily could be Ctrl-Left, Ctrl-Right, Ctrl-Del etc.
- EMACS :-((
- ED/EX/VI :-(


The following does _not_ have to do with any US-ASCII or ISO_646.irv:1991
standards which IIRC are inherited by POSIX.
> And why not make all program use filenames that have an 8+3 char garbled
> equivalent where the last 3 are the indicators of the filetype. Oh, and
> let's do everything to make sure the user doesn't leave Gnome/KDE.
> And of course, let's add new features to all existing protocols and
> other standards to make them "superior" to other implementations.
> Oh, and of course, we must require an extra 64 MB of memory and
> 500 MB of diskspace for each release, and a 200MHz faster processor.
> And let us do all system settings through a registry.
>
> OH! Let's change the name of the operating-system to something more
> catchy. Hmmm. Let's see. Windows maybe...
> </sarcasm>
>
>
> /David

I _do_ _not_ like Windoze either, but we live in a world
where we have to cope with it. I am even to code windoze
apps in order to support linux (no comment on this)...

-mirabilos


2001-03-07 08:30:26

by Ondrej Sury

[permalink] [raw]
Subject: Re: binfmt_script and ^M

Sean Hunter <[email protected]> writes:

> I propose
> /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude

Well, too me it seems that you are intolerant.


I think that it should not be added to kernel because:

#!/bin/sh
/usr/bin/perl^M

will write
--
: No such file or directory
--
(in real it writes '/usr/bin/perl<CR>: No such file or directory')

But what I do think is that more meaningful message should be printed to
output, because it is not only ^M issue. You could mistype name of
interpreter and don't notice (/usr/bin/eprl or similar typos), and printing
message saying 'script.pl: file not found' is confusing.

I see problem somewhere else. There are editors and viewers (for example
midnight commander) which will hide ^M from you, and left you totally
confused (that's why I am using emacs ;-), because you have no idea why it
doesn't work, because everything seems ok with this _broken_ behaviour.
This is really BAD thing.

And even more BAD thing is the intolerance shown in this thread. People
are not morons just because they don't understand confusing message which
shell gives them. Behaviour of kernel is good, error message is wrong.
What should be fixed is error message.

--
Ond?ej Sur? <[email protected]> Globe Internet s.r.o. http://globe.cz/
Tel: +420235365000 Fax: +420235365009 Pl?ni?kova 1, 162 00 Praha 6
Mob: +420605204544 ICQ: 24944126 Mapa: http://globe.namape.cz/
GPG fingerprint: CC91 8F02 8CDE 911A 933F AE52 F4E6 6A7C C20D F273

2001-03-09 16:58:19

by mirabilos

[permalink] [raw]
Subject: Re: binfmt_script and ^M

----- Original Message -----
From: "Sean Hunter" <[email protected]>
To: "Thorsten Glaser Geuer" <[email protected]>
Sent: Thursday, 8. March 2001 13:01
Subject: Re: binfmt_script and ^M


> On Tue, Mar 06, 2001 at 09:10:26PM -0000, Thorsten Glaser Geuer wrote:
> > ----- Original Message -----
> > From: "David Weinehall" <[email protected]>
> > To: "Sean Hunter" <[email protected]>; "Laramie Leavitt" <[email protected]>; <[email protected]>
> > Sent: Tuesday, 6. March 2001 15:37
> > Subject: Re: binfmt_script and ^M
> >
> >
> > > On Tue, Mar 06, 2001 at 03:12:42PM +0000, Sean Hunter wrote:
> > > >
> > > > I propose
> > > > /proc/sys/kernel/im_too_lame_to_learn_how_to_use_the_most_basic_of_unix_tools_so_i_want_the_kernel_to_be_filled_with_crap_to_disguise_my_ineptitude
> > > >
> > > > Any support?
> > >
> > > <sarcasm>
> > > Hey, let's go even further! Let's add support in all programs for \r\n.
> >
> > That is no sarcasm, it is ridiculous. CRLF line endings are ISO-IR-6 and
> > US-ASCII standard, and even UN*X systems used them when they had printers
> > (typewriters?) as output device, and no screens. With the Virtual Terminal,
> > Virtual Console stuff times may have changed but we have so many old stuff
> > in it... I won't remove them or didn't think of, but I remember you of:
> > - lost+found
> > - using ESC (or Alt???) as META for _shell commands_ which
> > easily could be Ctrl-Left, Ctrl-Right, Ctrl-Del etc.
> > - EMACS :-((
> > - ED/EX/VI :-(
> >
>
> This is pure bullshit, so I'm not copying my response to the list. None of that
> depends on the kernel.

I did not ever say that it depends on the kernel. The kernel _has to_ depend on
_standards_ coz otherwise it wouldn't be POSIX compliant, it wouldn't obey one
of the most basic ISO standards nowadays, etc. etc.
The kernel is faulty, and it has to use &h0D as CARRIAGE RETURN, whenever this
behaviour seems expactable (stress on -able).

> If he's to lame to do:
>
> for i in `find . -type f | grep -l "#! .*perl\r"`; do
> perl -pie 's|\r||g' < $i
> done
>
> ...or...
>
> ln -snf `which perl`{,\r}
>
> ...or...
>
> change his scripts to use "perl -w", which works, and he should use anyway
>
> ...or...
>
> Run his scripts directly by doing "perl /the/path/to/broken/script" (which also
> works)
>
> ...or any of the other solutions that various people suggested

_I_ am not talking about perl coz in my eyes this is waste. But there are numerous
other programmes with problems on it, too.

> ...then why should we accept a kernel patch to fix this non-problem? This is
> just plain ludicrous.
>
>
> Sean
>
> >
> > The following does _not_ have to do with any US-ASCII or ISO_646.irv:1991
> > standards which IIRC are inherited by POSIX.
> > > And why not make all program use filenames that have an 8+3 char garbled
> > > equivalent where the last 3 are the indicators of the filetype. Oh, and
> > > let's do everything to make sure the user doesn't leave Gnome/KDE.
> > > And of course, let's add new features to all existing protocols and
> > > other standards to make them "superior" to other implementations.
> > > Oh, and of course, we must require an extra 64 MB of memory and
> > > 500 MB of diskspace for each release, and a 200MHz faster processor.
> > > And let us do all system settings through a registry.
> > >
> > > OH! Let's change the name of the operating-system to something more
> > > catchy. Hmmm. Let's see. Windows maybe...
> > > </sarcasm>
> > >
> > >
> > > /David
> >
> > I _do_ _not_ like Windoze either, but we live in a world
> > where we have to cope with it. I am even to code windoze
> > apps in order to support linux (no comment on this)...
> >
-mirabilos