Date: Mon, 6 Jul 2015 02:08:24 +0100
From: Al Viro <viro@ZenIV.linux.org.uk>
To: jon <jon@jonshouse.co.uk>
Cc: Valdis.Kletnieks@vt.edu, coreutils@gnu.org, linux-kernel@vger.kernel.org
Subject: Re: Feature request, "create on mount" to create mount point
 directory on mount, implied remove on unmount
Message-ID: <20150706010824.GY17109@ZenIV.linux.org.uk>
References: <1435924919.6501.432.camel@jonspc>
 <172423.1436043394@turing-police.cc.vt.edu>
 <1436050108.6501.509.camel@jonspc>
 <20150705142936.GW17109@ZenIV.linux.org.uk>
 <1436111210.16546.29.camel@jonspc>
 <20150705173925.GX17109@ZenIV.linux.org.uk>
 <1436139348.16546.290.camel@jonspc>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1436139348.16546.290.camel@jonspc>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3802
Lines: 68

On Mon, Jul 06, 2015 at 12:35:48AM +0100, jon wrote:
> > Anyway, the underlying model hasn't changed much since _way_ back; each
> > thread of execution is a virtual machine of its own, with actual CPUs
> > switched between those.
> Ok, not sure I quite follow. What do you mean virtual machine  ? 
> My understanding was that a true VM has a hypervisor and I though also
> required some extra processor instructions to basically do an "outer"
> context switch (and some memory fiddling to fake up unqique address
> spaces) while the operating systems within the VMs own scheduler is
> doing the "inner" context switch (IE push/pop all on Intel style CPU).
> Not all architectures have any VM capability. 
> Are you talking about kernels on Intel with SMP enabled only ? 

Anything timesharing, starting with 7094 running CTSS.  Hypervisors allow to
virtualize priveleged mode parts of processor; it's a different beast.

Each process sees CPU and memory of its own; what the kernel does to give them
such an illusion depends upon the system (up to and including full write of
registers and user memory to disk, followed by reading that of the next
process back from disk - remember where the name "swap" had originally come
from?), but no matter how you do that, you give process a virtual CPU of
its own and multiplex the actual processor(s) between those.

>From the process' point of view system call is just a weird instruction that
has rather convoluted side effects.  The fact that it actually triggers
a trap on the underlying hardware CPU, switches to kernel mode, marshals
the arguments, arranges execution environment for C code, executes it, etc.
is immaterial - as far as userland code is concerned, the kernel is a black
box.  For all it cares, there might be another CPU sitting there, with
entirely different instruction set and something running on it.  With
"system call" insn on your CPU raising a signal to attract attention of
the priveleged one and stopping itself until the priveleged one tells it to
resume.

It's considerably older than hypervisors (and both are much older than
x86).

> > Parts of those virtual machines can be shared - e.g. you can have descriptor
> > table not just identical to that of parent at the time of clone(), but
> > actually shared with it, so e.g. open() in child makes the resulting descriptor
> > visible to parent as well.
> Ok, I follow you. I often dont need anything more complex than fork(),
> when I thread I use pthreads so have not dug around into what is
> actually happening at the kernel level.  I was not aware that the parent
> could see file descriptors created by the child, is this always true or
> only true if the parent and child are explicitly a shared memory
> process.

It is true if and only if clone(2) gets CLONE_FILES in its arguments.
Sharing address space is controlled by CLONE_VM and these can be used
independently; pthreads set both at the same time, but you can have shared
descriptor table without shared memory and vice versa.  Most of the time
you use shared descriptor tables, you want shared memory as well, but
it's not universally true.

> Ok, I follow that :-) But logically it must be done with two functions
> or handlers or something, so I would assume that my proposed "remove
> mount directory" would simply hang off whatever call truly discards the
> file system from the kernel.

Er...  _Which_ mount directory would you have removed (and what's to
guarantee that all filesystems it had been mounted on are still alive
when the last mount goes away)?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/