2004-09-01 20:34:28

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Jeremy Allison wrote:
> (most of the streams in a Word file for instance are quite small)

Streams in a Word file?

Are you saying that when I copy a .doc file onto my Linux box and off,
I lose part of a Word document?

-- Jamie


2004-09-01 20:38:31

by Jeremy Allison

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Wed, Sep 01, 2004 at 09:31:01PM +0100, Jamie Lokier wrote:
>
> I meant when I copy not using Samba. For example, I copy the .doc
> file in Windows NT to an FTP server.
>
> Does the FTP operation magically linearise the .doc streams on demand?
>
> Or does FTP lose part of the Word document?

Good question. It depends if the Microsoft ftp client is streams-aware,
and understands the Microsoft OLE structured storage format and will do
the linearisation on demand or not. I must confess I haven't tested this,
as I don't ever run Windows other than on vmware sessions for Samba testing
these days :-).

Probably a non-Microsoft ftp client would lose part of the word doc.

Jeremy.

2004-09-01 20:38:32

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Jeremy Allison wrote:
> > Streams in a Word file?
>
> Yep.
>
> > Are you saying that when I copy a .doc file onto my Linux box and off,
> > I lose part of a Word document?
>
> Right now no, because when Samba refuses the stream open, Word falls
> back into a "tar"-like mode where it linearises the streams into the
> data (it's a legacy mode for storing data on a FAT drive, not an NTFS
> drive). However, the problem is that no currently supported Microsoft
> OS doesn't have streams-capable NTFS support.

I meant when I copy not using Samba. For example, I copy the .doc
file in Windows NT to an FTP server.

Does the FTP operation magically linearise the .doc streams on demand?

Or does FTP lose part of the Word document?

-- Jamie

2004-09-01 20:59:50

by Jeremy Allison

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Wed, Sep 01, 2004 at 09:47:46PM +0100, Jamie Lokier wrote:
> Jeremy Allison wrote:
> > > I meant when I copy not using Samba. For example, I copy the .doc
> > > file in Windows NT to an FTP server.
> > >
> > > Does the FTP operation magically linearise the .doc streams on demand?
> > > Or does FTP lose part of the Word document?
> >
> > Good question. It depends if the Microsoft ftp client is streams-aware,
> > and understands the Microsoft OLE structured storage format and will do
> > the linearisation on demand or not. I must confess I haven't tested this,
> > as I don't ever run Windows other than on vmware sessions for Samba testing
> > these days :-).
> >
> > Probably a non-Microsoft ftp client would lose part of the word doc.
>
> So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> will _all_ lose part of a Word document when they handle it on a
> Window box?
>
> Ouch!

Yep. It's the meta data that Word stores in streams that will get lost.

Jeremy.

2004-09-01 21:18:55

by David Lang

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

what happens when you ue one of the many software packages around that
lets windows access a NFS server. when you copy the file to a NFS accessed
drive does it loose part of the data?

David Lang

On Wed, 1 Sep 2004, Jamie Lokier
wrote:

> Date: Wed, 1 Sep 2004 21:31:01 +0100
> From: Jamie Lokier <[email protected]>
> To: Jeremy Allison <[email protected]>
> Cc: Trond Myklebust <[email protected]>,
> Alan Cox <[email protected]>,
> Denis Vlasenko <[email protected]>,
> Rik van Riel <[email protected]>, Christer Weinigel <[email protected]>,
> Spam <[email protected]>, Andrew Morton <[email protected]>,
> [email protected], Linus Torvalds <[email protected]>, [email protected],
> [email protected], Linux Filesystem Development <[email protected]>,
> Linux Kernel Mailing List <[email protected]>, [email protected],
> [email protected]
> Subject: Re: silent semantic changes with reiser4
>
> Jeremy Allison wrote:
>>> Streams in a Word file?
>>
>> Yep.
>>
>>> Are you saying that when I copy a .doc file onto my Linux box and off,
>>> I lose part of a Word document?
>>
>> Right now no, because when Samba refuses the stream open, Word falls
>> back into a "tar"-like mode where it linearises the streams into the
>> data (it's a legacy mode for storing data on a FAT drive, not an NTFS
>> drive). However, the problem is that no currently supported Microsoft
>> OS doesn't have streams-capable NTFS support.
>
> I meant when I copy not using Samba. For example, I copy the .doc
> file in Windows NT to an FTP server.
>
> Does the FTP operation magically linearise the .doc streams on demand?
>
> Or does FTP lose part of the Word document?
>
> -- Jamie
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2004-09-01 21:41:25

by Lee Revell

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Wed, 2004-09-01 at 16:51, Jeremy Allison wrote:
> On Wed, Sep 01, 2004 at 09:47:46PM +0100, Jamie Lokier wrote:
> > Jeremy Allison wrote:
> > > > I meant when I copy not using Samba. For example, I copy the .doc
> > > > file in Windows NT to an FTP server.
> > > >
> > > > Does the FTP operation magically linearise the .doc streams on demand?
> > > > Or does FTP lose part of the Word document?
> > >
> > > Good question. It depends if the Microsoft ftp client is streams-aware,
> > > and understands the Microsoft OLE structured storage format and will do
> > > the linearisation on demand or not. I must confess I haven't tested this,
> > > as I don't ever run Windows other than on vmware sessions for Samba testing
> > > these days :-).
> > >
> > > Probably a non-Microsoft ftp client would lose part of the word doc.
> >
> > So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> > will _all_ lose part of a Word document when they handle it on a
> > Window box?
> >
> > Ouch!
>
> Yep. It's the meta data that Word stores in streams that will get lost.
>

This is shocking. When was this behavior introduced?

Lee

2004-09-01 23:16:42

by Jamie Lokier

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Jeremy Allison wrote:
> > I meant when I copy not using Samba. For example, I copy the .doc
> > file in Windows NT to an FTP server.
> >
> > Does the FTP operation magically linearise the .doc streams on demand?
> > Or does FTP lose part of the Word document?
>
> Good question. It depends if the Microsoft ftp client is streams-aware,
> and understands the Microsoft OLE structured storage format and will do
> the linearisation on demand or not. I must confess I haven't tested this,
> as I don't ever run Windows other than on vmware sessions for Samba testing
> these days :-).
>
> Probably a non-Microsoft ftp client would lose part of the word doc.

So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
will _all_ lose part of a Word document when they handle it on a
Window box?

Ouch!

The only sensible implementation I can imagine would be if the OS
linearised multi-stream Word documents into the non-stream format
automatically for all programs which don't know about streams.

Which is of course what I would like to implement for Linux...

- Jamie

2004-09-01 23:40:09

by Oliver Hunt

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

The loss of forks in the file is exxactly the problem you used to have
when transferring native Mac files to a PC...

This meant in order to transfer files to different filesystem you
often needed to tar/zip/whatever them first.

Bare in mind this would let us do the whole MacOS thing of putting an
entire application(plus plugins, etc) inside one "file"...

--Oliver

On Wed, 1 Sep 2004 13:51:40 -0700, Jeremy Allison <[email protected]> wrote:
> On Wed, Sep 01, 2004 at 09:47:46PM +0100, Jamie Lokier wrote:
> > Jeremy Allison wrote:
> > > > I meant when I copy not using Samba. For example, I copy the .doc
> > > > file in Windows NT to an FTP server.
> > > >
> > > > Does the FTP operation magically linearise the .doc streams on demand?
> > > > Or does FTP lose part of the Word document?
> > >
> > > Good question. It depends if the Microsoft ftp client is streams-aware,
> > > and understands the Microsoft OLE structured storage format and will do
> > > the linearisation on demand or not. I must confess I haven't tested this,
> > > as I don't ever run Windows other than on vmware sessions for Samba testing
> > > these days :-).
> > >
> > > Probably a non-Microsoft ftp client would lose part of the word doc.
> >
> > So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> > will _all_ lose part of a Word document when they handle it on a
> > Window box?
> >
> > Ouch!
>
> Yep. It's the meta data that Word stores in streams that will get lost.
>
>
>
> Jeremy.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-09-02 00:28:56

by Jeremy Allison

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Wed, Sep 01, 2004 at 09:19:45PM +0100, Jamie Lokier wrote:
> Jeremy Allison wrote:
> > (most of the streams in a Word file for instance are quite small)
>
> Streams in a Word file?

Yep.

> Are you saying that when I copy a .doc file onto my Linux box and off,
> I lose part of a Word document?

Right now no, because when Samba refuses the stream open, Word falls
back into a "tar"-like mode where it linearises the streams into the
data (it's a legacy mode for storing data on a FAT drive, not an NTFS
drive). However, the problem is that no currently supported Microsoft
OS doesn't have streams-capable NTFS support.

This means that in a future MS-Office revision, this backwards support
may be broken by accident or by design (less likely, Microsoft really
don't do that kind of thing without very high level requests :-) and no
testers at Microsoft will notice (because they only test against MS servers).

Jeremy.

2004-09-02 10:56:43

by Alan

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Mer, 2004-09-01 at 21:26, Jeremy Allison wrote:
> Right now no, because when Samba refuses the stream open, Word falls
> back into a "tar"-like mode where it linearises the streams into the
> data (it's a legacy mode for storing data on a FAT drive, not an NTFS
> drive). However, the problem is that no currently supported Microsoft
> OS doesn't have streams-capable NTFS support.

That implies that samba can do the same transform set without kernel
help and you'd even get the advantages of being able to transfer the
stuff around sanely afterwards.

What I don't understand is the tie between Linux having such streams and
Windows doing it for Samba to work. Netatalk has always handle this for
Macintosh and portably. Presumably any Samba support would need to
handle OS's without wacky files for portability too ?

2004-09-02 11:14:27

by Oliver Neukum

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Am Donnerstag, 2. September 2004 11:48 schrieb Alan Cox:
> What I don't understand is the tie between Linux having such streams and
> Windows doing it for Samba to work. Netatalk has always handle this for
> Macintosh and portably. Presumably any Samba support would need to
> handle OS's without wacky files for portability too ?

Can you do an atomic rename of all streams without kernel support?

Regards
Oliver

2004-09-02 12:06:57

by Alan

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Iau, 2004-09-02 at 12:13, Oliver Neukum wrote:
> Am Donnerstag, 2. September 2004 11:48 schrieb Alan Cox:
> > What I don't understand is the tie between Linux having such streams and
> > Windows doing it for Samba to work. Netatalk has always handle this for
> > Macintosh and portably. Presumably any Samba support would need to
> > handle OS's without wacky files for portability too ?
>
> Can you do an atomic rename of all streams without kernel support?

That depends how SAMBA chooses to handle the problem internally and how
it chooses to store the data. The netatalk people have atomicity from
the view of clients but not from the unix fs internal view.

Alan

2004-09-02 13:51:52

by Theodore Ts'o

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Wed, Sep 01, 2004 at 01:51:40PM -0700, Jeremy Allison wrote:
> > So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> > will _all_ lose part of a Word document when they handle it on a
> > Window box?
> >
> > Ouch!
>
> Yep. It's the meta data that Word stores in streams that will get lost.

And this is why I believe that using streams in application is well,
ill-advised. Indeed, one of my concerns with providing streams
support is that application authors may make the mistake of using it,
and we will be back to the bad old days (when MacOS made this mistake)
where you will need to binhex files before you ftp them (and unbinhex
them on the otherside) --- and if you forget, the resulting file will
be useless.

I understand why the Samba folks want this feature very badly;
however, hopefully other projects will know enough *not* to use
streams once they become available in Linux....

- Ted

2004-09-02 14:53:39

by Stuart Young

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Thu, 2 Sep 2004 22:54, Theodore Ts'o wrote:
> On Wed, Sep 01, 2004 at 01:51:40PM -0700, Jeremy Allison wrote:
> > > So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> > > will _all_ lose part of a Word document when they handle it on a
> > > Window box?
> > >
> > > Ouch!
> >
> > Yep. It's the meta data that Word stores in streams that will get lost.
>
> And this is why I believe that using streams in application is well,
> ill-advised. Indeed, one of my concerns with providing streams
> support is that application authors may make the mistake of using it,
> and we will be back to the bad old days (when MacOS made this mistake)
> where you will need to binhex files before you ftp them (and unbinhex
> them on the otherside) --- and if you forget, the resulting file will
> be useless.

At least currently (to my knowledge anyway) all stream support in Windows is
data that is not important, and that can be either regenerated from
filesystem metadata or (more usually) the main file stream itself.

This sort of data is really where streams excel, by providing a way to access
data that would otherwise take time/cpu to regenerate over and over, but that
in itself is not indispensable. Good examples of this are indexes of data
within a document, details of who owns/created/modified the document, common
views or reformatting of the data, etc. With audio/video/graphics, you could
store lower quality transforms of data (eg: stereo to mono, resolution
reduction, thumbnails, etc) in the streams for a file. With a word document,
it could be things like an index (assuming it's auto-generated from section
headings). With a database, it could be the indexes, and a few views that are
expensive time-wise to generate. All of these are easily regenerated from the
original data stream, but takes a while. And if you've got the disk, why not
use it?

If streams were always to be considered volatile, then you could do all sorts
of interesting things with them. Any disk cleanup mechanism you have could
also reap old streams specifically if the disk gets below a certain amount
free. This means that old streams that are hanging about don't end up wasting
all your disk space. Of course, you'd want a way to disable this (for servers
mainly), and streams would have to be considered volatile on more than just
Linux as a platform for this to be useful.

Note that I'm not particularly advocating streams here. I'm just pointing out
'how' it could be useful. It could be very easy to misuse streams and cause
huge problems (as per Ted's comments), but it's always good to know the other
side of the argument.

--
Stuart Young (aka Cef)
[email protected] is for LKML and related email only

2004-09-03 07:49:36

by Helge Hafting

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

Stuart Young wrote:

>On Thu, 2 Sep 2004 22:54, Theodore Ts'o wrote:
>
>
>>On Wed, Sep 01, 2004 at 01:51:40PM -0700, Jeremy Allison wrote:
>>
>>
>>>>So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
>>>>will _all_ lose part of a Word document when they handle it on a
>>>>Window box?
>>>>
>>>>Ouch!
>>>>
>>>>
>>>Yep. It's the meta data that Word stores in streams that will get lost.
>>>
>>>
>>And this is why I believe that using streams in application is well,
>>ill-advised. Indeed, one of my concerns with providing streams
>>support is that application authors may make the mistake of using it,
>>and we will be back to the bad old days (when MacOS made this mistake)
>>where you will need to binhex files before you ftp them (and unbinhex
>>them on the otherside) --- and if you forget, the resulting file will
>>be useless.
>>
>>
This is not a problem with multiple streams implemented right - as a
directory.
You don't stay away from directories just because you have to tar
them in order to put them on ftp sites? ;-)

Still, you're right that apps using streams jsut for the hell of it is bad.
That sort of thing happens all the time when something new shows up,
some people will use it without thinking.

>
>At least currently (to my knowledge anyway) all stream support in Windows is
>data that is not important, and that can be either regenerated from
>filesystem metadata or (more usually) the main file stream itself.
>
>This sort of data is really where streams excel, by providing a way to access
>data that would otherwise take time/cpu to regenerate over and over, but that
>in itself is not indispensable.
>
Streams as a cache.

>Good examples of this are indexes of data
>within a document, details of who owns/created/modified the document, common
>views or reformatting of the data, etc. With audio/video/graphics, you could
>store lower quality transforms of data (eg: stereo to mono, resolution
>reduction, thumbnails, etc) in the streams for a file. With a word document,
>it could be things like an index (assuming it's auto-generated from section
>headings). With a database, it could be the indexes, and a few views that are
>expensive time-wise to generate. All of these are easily regenerated from the
>original data stream, but takes a while. And if you've got the disk, why not
>use it?
>
>
Actually, some if this is bad examples of using streams.
Why can't that index be stored _in_ the document?
After all, the word processor is the one who knows the document's
internal format, how to generate the index, and how to use it. Well,
this example is bad anyway as an index can be created so fast from
a well structured document that there is little need for storing one.
(Example - see how lyx keeps a live index in the "navigate" menu. . .)

A document format may also contain fields specifying who made it, or
even a log of who modified it and when.

It seems to me that streams are more useful for stuff that the file's
main application don't deal with itself. Such as attaching icons that
follow the file around.

>If streams were always to be considered volatile, then you could do all sorts
>of interesting things with them. Any disk cleanup mechanism you have could
>also reap old streams specifically if the disk gets below a certain amount
>free.
>
That would limit streams to caching use _only_, disks fill occationally
and we can't have _useful_ stuff disappearing at random.

Helge Hafting

2004-09-03 10:48:09

by Stuart Young

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Fri, 3 Sep 2004 17:53, Helge Hafting wrote:
> Stuart Young wrote:
> >On Thu, 2 Sep 2004 22:54, Theodore Ts'o wrote:
> >>On Wed, Sep 01, 2004 at 01:51:40PM -0700, Jeremy Allison wrote:
> >>>>So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> >>>>will _all_ lose part of a Word document when they handle it on a
> >>>>Window box?
> >>>>
> >>>>Ouch!
> >>>
> >>>Yep. It's the meta data that Word stores in streams that will get lost.
> >>
> >>And this is why I believe that using streams in application is well,
> >>ill-advised. Indeed, one of my concerns with providing streams
> >>support is that application authors may make the mistake of using it,
> >>and we will be back to the bad old days (when MacOS made this mistake)
> >>where you will need to binhex files before you ftp them (and unbinhex
> >>them on the otherside) --- and if you forget, the resulting file will
> >>be useless.
>
> This is not a problem with multiple streams implemented right - as a
> directory.

This works if you want (possibly editable) views of a single document. It
doesn't really work that well for other things. Even then, something has to
handle all the data translations and decoding, and that means you need to
store all the reference information somewhere so that the app knows what to
pass the original file to, so that it can be decoded for the user as they've
requested.

> You don't stay away from directories just because you have to tar
> them in order to put them on ftp sites? ;-)

No, but I know a lot of people who don't understand that with a lot of ftp
servers that if you append .tar.gz to the directory name, you can fetch the
directory. Sure it's all user education, but you have to remember that users
don't generally want to be educated. They just want to get on with doing
whatever they are doing.

> Still, you're right that apps using streams jsut for the hell of it is bad.
> That sort of thing happens all the time when something new shows up,
> some people will use it without thinking.

Hence why I was suggesting the idea of disposable data in streams. As long as
people KNOW it's disposable, but useful to keep around as it cuts down the
time needed to do stuff, then apps will start to pick up transporting streams
properly. Least then (hopefully) no real information will get lost that is
important. Once transporting streams becomes commonplace, then perhaps
streams can be used for more useful things.

> >At least currently (to my knowledge anyway) all stream support in Windows
> > is data that is not important, and that can be either regenerated from
> > filesystem metadata or (more usually) the main file stream itself.
> >
> >This sort of data is really where streams excel, by providing a way to
> > access data that would otherwise take time/cpu to regenerate over and
> > over, but that in itself is not indispensable.
>
> Streams as a cache.

Almost exactly.

> >Good examples of this are indexes of data
> >within a document, details of who owns/created/modified the document,
> > common views or reformatting of the data, etc. With audio/video/graphics,
> > you could store lower quality transforms of data (eg: stereo to mono,
> > resolution reduction, thumbnails, etc) in the streams for a file. With a
> > word document, it could be things like an index (assuming it's
> > auto-generated from section headings). With a database, it could be the
> > indexes, and a few views that are expensive time-wise to generate. All of
> > these are easily regenerated from the original data stream, but takes a
> > while. And if you've got the disk, why not use it?
>
> Actually, some if this is bad examples of using streams.
> Why can't that index be stored _in_ the document?
> After all, the word processor is the one who knows the document's
> internal format, how to generate the index, and how to use it. Well,
> this example is bad anyway as an index can be created so fast from
> a well structured document that there is little need for storing one.
> (Example - see how lyx keeps a live index in the "navigate" menu. . .)

The point of such information in my examples is that a stream can store
information in a particular format (ie: an index) that is common to one
indexing app/library. Such an index can be used by ANY app that knows the
index format to search the document. This is almost exactly what MS will do
(if they haven't done it already) with the File Indexing Service. As it's ONE
library, then any new user app that creates data can add index creation by
adding one library. And any app that wants to search these indexes would need
only to add one library, not every library for every format that it wants to
search. It's essentially an n^2 vs 2n problem.

> A document format may also contain fields specifying who made it, or
> even a log of who modified it and when.

Exactly, but then you need to:
1. Read in the Document, and in some formats that are either compressed or
that do not have a fixed format, this may mean you end up reading most of or
even an entire file to get this data. This is expensive on CPU, disk cache,
disk access and transfer times with large files, particularly if it's a slow
data bus to the storage device.
2. Parse out the information you want, which means that you need to
understand the format. With an index stream, in a fixed format, you only need
one library that can handle the format for you. Without streams, you use lots
of libraries. By using lots of libraries, you end up using lots of memory to
hold them, and disk access time/transfer speed to load them.

> It seems to me that streams are more useful for stuff that the file's
> main application don't deal with itself. Such as attaching icons that
> follow the file around.

Icons themselves wouldn't be that good for multi-user environments. Attaching
an icon type designator that then allows the app that wants to display an
icon for the file to pull the appropriate icon from somewhere else (and
therefore can be matched to whatever theme or specific icon the user wants)
is probably a better idea.

> >If streams were always to be considered volatile, then you could do all
> > sorts of interesting things with them. Any disk cleanup mechanism you
> > have could also reap old streams specifically if the disk gets below a
> > certain amount free.
>
> That would limit streams to caching use _only_, disks fill occationally
> and we can't have _useful_ stuff disappearing at random.

If they transfer a file with streams on Windows using standard (Posix based)
tools at the moment then they lose this information anyway. And application
developers aren't all suddenly going to drop whatever it is they're doing to
implement stream support as soon as it appears. Things take time, and till
the majority of applications support it, it's useless to assume that the data
is not going to dissapear because of a user using a non-updated tool. By
limiting it to caching, at least in the short term, at least no one is going
to (hopefully!) get bitten by data loss. Later (and I'm guessing a LOT
later), who knows, streams could actually become useful for more than just
caching.

--
Stuart Young (aka Cef)
[email protected] is for LKML and related email only

2004-09-04 01:10:38

by Brad Boyer

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Thu, Sep 02, 2004 at 10:48:46AM +0100, Alan Cox wrote:
> What I don't understand is the tie between Linux having such streams and
> Windows doing it for Samba to work. Netatalk has always handle this for
> Macintosh and portably. Presumably any Samba support would need to
> handle OS's without wacky files for portability too ?

I'm not 100% sure on the samba side, but I think there is a pretty
significant difference. On the Mac, the problem of copying forks and
metadata onto non-Mac systems was recognized early on. There are several
standard formats for serialized versions of this data. If you take the
files that netatalk writes and copy them directly to a Mac separately,
there are tools that can convert them back to the original format with
all the data intact. I've never seen such a thing for NTFS named streams.

Brad Boyer
[email protected]

2004-09-04 06:04:03

by David Masover

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Brad Boyer wrote:
| On Thu, Sep 02, 2004 at 10:48:46AM +0100, Alan Cox wrote:
|
|>What I don't understand is the tie between Linux having such streams and
|>Windows doing it for Samba to work. Netatalk has always handle this for
|>Macintosh and portably. Presumably any Samba support would need to
|>handle OS's without wacky files for portability too ?
|
|
| I'm not 100% sure on the samba side, but I think there is a pretty
| significant difference. On the Mac, the problem of copying forks and
| metadata onto non-Mac systems was recognized early on. There are several
| standard formats for serialized versions of this data. If you take the
| files that netatalk writes and copy them directly to a Mac separately,
| there are tools that can convert them back to the original format with
| all the data intact. I've never seen such a thing for NTFS named streams.

I had an idea for how to solve this. Search the archives for
"serialization" or "serialize".

Basically, it involves creating a better interface to a better "dump".
You do
cat foo/serialize > foo.img
and send foo.img over the network. Or you just tell mutt/scp/whatever
to grab foo/serialize instead of foo. Can't be worse than a mac
resource fork, and it looks more user-friendly to me.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iQIVAwUBQTlauHgHNmZLgCUhAQK7Iw//SAi7I84OIz4xHh5S7i1xc+JugPvELyh2
dvrVkETvQlvRcABZ0GYlCQQPPzL+QcTVnruMRAUZGrGCPJdTkxfae41tX5KDS8/4
OpPWEDki+FSJrv+Mn9Pbm3I5Bxlhu0+nuNOIS0HSULo+/IBV4f/ldv8TUKSbXeXJ
YjAXJJECpWTlYwF8sTvM/ALHpo+6xEtJq5gQxoUFnw4Pio8eycalS9m2cDs/N6rq
PYcOjb7pWjGEk+9qimmwIcX5LnBXl8L9OhXqMQoR3Of+blniIrEOtg/0WLrwzu62
rW+rBxDAfoDxIZZvquf/gyJ6stO8QzGeQqoxxTUXhbI5PUtD+qGO+tWT6+/EljHF
qNri65JwB8PetbqdjWmsTO3V+FVSRy3hu4/vTNpnmwnR8H+yilWWDc7Tnsjql+dh
9j4ycfT5xDAA9dC6APOSV5AEgI9z8FntQZzOCPe1lyLD7Qwjdak+3Nw2I3yNS+Pl
Xa5ijSm3IKJ+JqKpZKRzCRyXRQdeAj2kGaWQP4Ui3D/RwBEHtn7shwdJ9Hku6KkN
9ZyiTPqY5XGiyLrJWqRc9MraepGhuxxtYcq65A+vdJq2fxA5+XRF8UGbYc9KW5lF
1fYSXA2DBNJjeNEvZw8LWI54OrIMSlhj3dWLX8WkkXog2gWO7ly0E6OiA0odZV6M
i9sDeRClqlM=
=CFb4
-----END PGP SIGNATURE-----

2004-09-07 11:48:22

by Herbert Poetzl

[permalink] [raw]
Subject: Re: silent semantic changes with reiser4

On Thu, Sep 02, 2004 at 11:22:41AM +1200, Oliver Hunt wrote:
> The loss of forks in the file is exxactly the problem you used to have
> when transferring native Mac files to a PC...
>
> This meant in order to transfer files to different filesystem you
> often needed to tar/zip/whatever them first.
>
> Bare in mind this would let us do the whole MacOS thing of putting an
> entire application(plus plugins, etc) inside one "file"...

hum, well aside from the fact that recent MacOS (X) does
not put applications in one "file", it does simply store
related stuff inside a directory which is presented as
one unit by the GUI ... something I consider useful and
which allows to 'move' the application around without
rendering them useless ...

best,
Herbert

> --Oliver
>
> On Wed, 1 Sep 2004 13:51:40 -0700, Jeremy Allison <[email protected]> wrote:
> > On Wed, Sep 01, 2004 at 09:47:46PM +0100, Jamie Lokier wrote:
> > > Jeremy Allison wrote:
> > > > > I meant when I copy not using Samba. For example, I copy the .doc
> > > > > file in Windows NT to an FTP server.
> > > > >
> > > > > Does the FTP operation magically linearise the .doc streams on demand?
> > > > > Or does FTP lose part of the Word document?
> > > >
> > > > Good question. It depends if the Microsoft ftp client is streams-aware,
> > > > and understands the Microsoft OLE structured storage format and will do
> > > > the linearisation on demand or not. I must confess I haven't tested this,
> > > > as I don't ever run Windows other than on vmware sessions for Samba testing
> > > > these days :-).
> > > >
> > > > Probably a non-Microsoft ftp client would lose part of the word doc.
> > >
> > > So you're saying SCP, CVS, Subversion, Bitkeeper, Apache and rsyncd
> > > will _all_ lose part of a Word document when they handle it on a
> > > Window box?
> > >
> > > Ouch!
> >
> > Yep. It's the meta data that Word stores in streams that will get lost.
> >
> >
> >
> > Jeremy.
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/