2002-12-04 18:44:44

by mbm

[permalink] [raw]
Subject: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

> Alexander Riesen wrote:
> >On Wed, Dec 04, 2002 at 12:34:19PM +0100, Matthias Andree wrote:
> >> SuSE Linux 7.0, 7.3, 8.1 (2.4.19 kernel, binfmt_script.c identical to
> >> 2.4.20 BK):
> >> $ /tmp/try.pl
> >> /bin/sh: -- # -*- perl -*- -T: invalid option
> >
> >looks correct. The interpreter (/bin/sh) has got everything after
> >its name. IOW: "-- # -*- perl -*- -T"
> >It's just solaris' shell (/bin/sh) just ignores options starting with
> >"--". And freebsd's as well.
>
> FreeBSD splits #! magic strings on whitespace and passes multiple
> arguments. Linux passes everything after the first whitespace as a
> single argument but strips trailing whitespace. NetBSD does the same as
> Linux but passes trailing whitespace as part of the argument.

NetBSD is also broken, in that case. Everything I ever used that
supported #! except linux worked sensibly. I even wrote myself
a patch, which still applied to 2.4 fairly recently. It's not
perfect (has a fixed number of args to avoid allocating from the
heap, would be easy to change), but it's functional.


*** linux-2.4.20-rc1/fs/binfmt_script.c Sun Nov 10 20:34:21 2002
--- linux-mbm/fs/binfmt_script.c Sun Nov 10 20:32:30 2002
***************
*** 3,8 ****
--- 3,10 ----
*
* Copyright (C) 1996 Martin von L?wis
* original #!-checking implemented by tytso.
+ * 2001-10-06 - [email protected] - reimplement to be more like *BSD/SVR4
+ * (allow multiple args and terminate at '#')
*/

#include <linux/module.h>
***************
*** 14,25 ****
#include <linux/file.h>
#include <linux/smp_lock.h>

static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
{
! char *cp, *i_name, *i_arg;
struct file *file;
char interp[BINPRM_BUF_SIZE];
int retval;

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || (bprm->sh_bang))
return -ENOEXEC;
--- 16,96 ----
#include <linux/file.h>
#include <linux/smp_lock.h>

+ #define MAX_SCRIPT_ARGS 16
+
+ /*
+ * Split line into arguments separated by white space.
+ * Line terminates at first of NUL, NL or '#' character.
+ *
+ * Caller must supply suitable array to hold pointers.
+ * Input string is modified to separate argument strings with NUL.
+ * *nargs contains max args allowed on entry, actual number found on exit.
+ */
+ static void split_line(char *p, char **argsout, int *nargs)
+ {
+ int maxn;
+ int in_arg;
+ char *argstart;
+ char c;
+
+ maxn = *nargs;
+ *nargs = 0;
+
+ if( maxn < 1 )
+ return;
+
+ in_arg = 0;
+ argstart = 0;
+ c = *p;
+
+ for (;;) {
+ int finished = 0;
+
+ switch (c) {
+ case '#':
+ case '\n':
+ case '\0':
+ finished = 1;
+ /*FALLTHROUGH*/
+
+ case ' ':
+ case '\t':
+ if (in_arg) {
+ /* finished an actual argument */
+ int n = *nargs;
+ *p = '\0';
+ argsout[n++] = argstart;
+ *nargs = n;
+ in_arg = 0;
+ argstart = 0;
+ if (n == maxn)
+ finished = 1;
+ }
+ if (finished)
+ return;
+ break;
+
+ default:
+ if (!in_arg) {
+ /* start new argument */
+ in_arg = 1;
+ argstart = p;
+ }
+ break;
+ }
+ c = * ++p;
+ }
+ }
+
static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
{
! char *i_name;
struct file *file;
char interp[BINPRM_BUF_SIZE];
int retval;
+ char *args[1+MAX_SCRIPT_ARGS]; /* includes interpreter name */
+ int nargs = 1+MAX_SCRIPT_ARGS; /* max on i/p to split, actual on o/p */
+ char **pp = &args[0];

if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') || (bprm->sh_bang))
return -ENOEXEC;
***************
*** 34,83 ****
bprm->file = NULL;

bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
! if ((cp = strchr(bprm->buf, '\n')) == NULL)
! cp = bprm->buf+BINPRM_BUF_SIZE-1;
! *cp = '\0';
! while (cp > bprm->buf) {
! cp--;
! if ((*cp == ' ') || (*cp == '\t'))
! *cp = '\0';
! else
! break;
! }
! for (cp = bprm->buf+2; (*cp == ' ') || (*cp == '\t'); cp++);
! if (*cp == '\0')
! return -ENOEXEC; /* No interpreter name found */
! i_name = cp;
! i_arg = 0;
! for ( ; *cp && (*cp != ' ') && (*cp != '\t'); cp++)
! /* nothing */ ;
! while ((*cp == ' ') || (*cp == '\t'))
! *cp++ = '\0';
! if (*cp)
! i_arg = cp;
! strcpy (interp, i_name);
/*
! * OK, we've parsed out the interpreter name and
! * (optional) argument.
* Splice in (1) the interpreter's name for argv[0]
! * (2) (optional) argument to interpreter
* (3) filename of shell script (replace argv[0])
*
* This is done in reverse order, because of how the
* user environment and arguments are stored.
*/
remove_arg_zero(bprm);
! retval = copy_strings_kernel(1, &bprm->filename, bprm);
! if (retval < 0) return retval;
bprm->argc++;
! if (i_arg) {
! retval = copy_strings_kernel(1, &i_arg, bprm);
! if (retval < 0) return retval;
! bprm->argc++;
}
retval = copy_strings_kernel(1, &i_name, bprm);
! if (retval) return retval;
bprm->argc++;
/*
* OK, now restart the process with the interpreter's dentry.
*/
--- 105,144 ----
bprm->file = NULL;

bprm->buf[BINPRM_BUF_SIZE - 1] = '\0';
!
! split_line(bprm->buf + 2,args,&nargs);
!
! if( nargs == 0 )
! return -ENOEXEC;
!
/*
! * We now have the line tokenized, excluding the leading "#!".
* Splice in (1) the interpreter's name for argv[0]
! * (2) (optional) arguments to interpreter
* (3) filename of shell script (replace argv[0])
*
* This is done in reverse order, because of how the
* user environment and arguments are stored.
*/
+ i_name = *pp++;
+ --nargs;
+ strcpy(interp,i_name);
+
remove_arg_zero(bprm);
! retval = copy_strings_kernel(1,&bprm->filename,bprm);
! if (retval) return retval;
bprm->argc++;
!
! if( nargs > 0 ) {
! retval = copy_strings_kernel(nargs,pp,bprm);
! if (retval) return retval;
! bprm->argc += nargs;
}
+
retval = copy_strings_kernel(1, &i_name, bprm);
! if (retval) return retval;
bprm->argc++;
+
/*
* OK, now restart the process with the interpreter's dentry.
*/

>
> >Anyway - it's bash, not the bin_fmt.
>
> Bash is (correctly) complaining that it's been passed an invalid
> argument, but the reason for the different behaviour between it and
> FreeBSD is because of binfmt_script. There's no clearly defined standard
> for how this should behave.

Arguably, whatever the original BSD did. The linux version is clearly
bogus. (I fell over the perl thing too, since at work I have a common
home dir for multiple architectures where I therefore multiple binaries
of an up-to-date perl.)

-Malcolm


2002-12-04 19:15:34

by Petr Vandrovec

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

On 4 Dec 02 at 18:52, [email protected] wrote:
> > single argument but strips trailing whitespace. NetBSD does the same as
> > Linux but passes trailing whitespace as part of the argument.
>
> NetBSD is also broken, in that case. Everything I ever used that
> supported #! except linux worked sensibly. I even wrote myself
> a patch, which still applied to 2.4 fairly recently. It's not
> perfect (has a fixed number of args to avoid allocating from the
> heap, would be easy to change), but it's functional.

Without adding support to parse " and ' it is unacceptable. I have
dozens of scripts which use argument with spaces inside... Also
all references I was able to found talks about "single optional
argument" (SCO, AIX)... Try running script containing

#! /bin/ls a b c

on your favorite system. If it will report
'/bin/ls: a b c: No such file or directory', or
'ls: 0653-341 The file a b c does not exist', system does not split
argument on spaces. If it will talk about 'a' not found, it splits them
on spaces.

And because of I was not able to find anything in POSIX which would say
that we should do split on spaces (not that I found that we should not),
I vote for leaving current behavior in Linux, and fixing perl manpage
(and eventually FreeBSD, if anyone is interested) instead.
Petr Vandrovec
[email protected]


2002-12-04 19:53:36

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Followup to: <[email protected]>
By author: "Petr Vandrovec" <[email protected]>
In newsgroup: linux.dev.kernel
>
> And because of I was not able to find anything in POSIX which would say
> that we should do split on spaces (not that I found that we should not),
> I vote for leaving current behavior in Linux, and fixing perl manpage
> (and eventually FreeBSD, if anyone is interested) instead.
>

Classic catch-22: POSIX won't standardize it because of lack of
consistency between UNIX implementations, although everyone pretty
much agrees it would be a desirable feature to add to the standard.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-12-05 02:34:33

by Miles Bader

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

"Petr Vandrovec" <[email protected]> writes:
> And because of I was not able to find anything in POSIX which would say
> that we should do split on spaces (not that I found that we should not),
> I vote for leaving current behavior in Linux, and fixing perl manpage
> (and eventually FreeBSD, if anyone is interested) instead.

Indeed.

Whatever linux does, that usage remains massively unportable, so the man
page should be changed regardless.

-Miles
--
A zen-buddhist walked into a pizza shop and
said, "Make me one with everything."

2002-12-08 21:38:01

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Hi!

> > And because of I was not able to find anything in POSIX which would say
> > that we should do split on spaces (not that I found that we should not),
> > I vote for leaving current behavior in Linux, and fixing perl manpage
> > (and eventually FreeBSD, if anyone is interested) instead.
> >
>
> Classic catch-22: POSIX won't standardize it because of lack of
> consistency between UNIX implementations, although everyone pretty
> much agrees it would be a desirable feature to add to the standard.

Why can't we simply have /bin/bash_that_splits_args_itself
?
Pavel
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...

2002-12-09 18:18:24

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Hi!

> > Why can't we simply have /bin/bash_that_splits_args_itself
>
> We could, but it would in practice mean doing an extra exec() for each
> executable. This seems undesirable.

Only for executables that need argument spliting... For such scripts I
guess we can get handle the overhead.

Pavel

--
Casualities in World Trade Center: ~3k dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

2002-12-09 18:12:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Pavel Machek wrote:
>
> Why can't we simply have /bin/bash_that_splits_args_itself
> ?
> Pavel

We could, but it would in practice mean doing an extra exec() for each
executable. This seems undesirable.

-hpa

2002-12-09 19:15:31

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Followup to: <[email protected]>
By author: Pavel Machek <[email protected]>
In newsgroup: linux.dev.kernel
>
> Hi!
>
> > > Why can't we simply have /bin/bash_that_splits_args_itself
> >
> > We could, but it would in practice mean doing an extra exec() for each
> > executable. This seems undesirable.
>
> Only for executables that need argument spliting... For such scripts I
> guess we can get handle the overhead.
>

We probably can, but a better question is really: what are the
semantics that users expect? Given that Unices are by and large
inconsistent, we should pick the behaviour that makes sense to the
most people. I suspect that most people would expect whitespace
partition.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2002-12-09 23:56:31

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

On Mon, 2002-12-09 at 19:23, H. Peter Anvin wrote:
> We probably can, but a better question is really: what are the
> semantics that users expect? Given that Unices are by and large
> inconsistent, we should pick the behaviour that makes sense to the
> most people. I suspect that most people would expect whitespace
> partition.

I'd rather keep it as is. We should be doing IFS partition with quoting,
UTF-8 awareness according to locale and locale specific rules on
whitespace. That says "userspace" all over it.

We can keep this out of kernel which is good - though it reminds me we
do need to fix backspace/delete in the tty layer on a unicode configured
tty to do the right thing. (Extra ioctls for unicode delete characters
are a longer less funny subject though)


2002-12-10 00:21:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Alan Cox wrote:
>
> I'd rather keep it as is. We should be doing IFS partition with quoting,
> UTF-8 awareness according to locale and locale specific rules on
> whitespace. That says "userspace" all over it.
>

Actually, I would argue we definitely should *not*.

-hpa

2002-12-10 00:26:46

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

On Tue, 2002-12-10 at 00:28, H. Peter Anvin wrote:
> Alan Cox wrote:
> >
> > I'd rather keep it as is. We should be doing IFS partition with quoting,
> > UTF-8 awareness according to locale and locale specific rules on
> > whitespace. That says "userspace" all over it.
> >
>
> Actually, I would argue we definitely should *not*.

I don't see how you can do otherwise. The very definition of
"whitespace" is locale specific. Throwing it at the target as one string
lets the kernel stay out of locales, languages, policy and other parsing
horrors. Splitting it has to be done properly or its a non reversible
operation and tools can't correct the kernel screwups

2002-12-10 00:29:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Alan Cox wrote:
>
> I don't see how you can do otherwise. The very definition of
> "whitespace" is locale specific. Throwing it at the target as one string
> lets the kernel stay out of locales, languages, policy and other parsing
> horrors. Splitting it has to be done properly or its a non reversible
> operation and tools can't correct the kernel screwups
>

In theory I agree with you. I'm just wondering what is the most practical.

-hpa

2002-12-10 19:58:12

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Re: #! incompatible -- binfmt_script.c broken?

Hi!

> > > > Why can't we simply have /bin/bash_that_splits_args_itself
> > >
> > > We could, but it would in practice mean doing an extra exec() for each
> > > executable. This seems undesirable.
> >
> > Only for executables that need argument spliting... For such scripts I
> > guess we can get handle the overhead.
> >
>
> We probably can, but a better question is really: what are the
> semantics that users expect? Given that Unices are by and large
> inconsistent, we should pick the behaviour that makes sense to the
> most people. I suspect that most people would expect whitespace
> partition.

Most people would also expect " and ' to work, and $FOO to work
:-(. So I believe keeping it simple is right.
Pavel
--
Worst form of spam? Adding advertisment signatures ala sourceforge.net.
What goes next? Inserting advertisment *into* email?