Question:
How hard would it be to implement a volume manager with an extended
API? I want to write a tightly-integrated volume manager/distributed
lock manager to arbitrate lock-free, atmoic read/compare/write
operations with other computers on the network. Also, I have a need to
have this API available to user-space applications. How would I do
this?
Details follow, if you're curious.
The idea is to make a distributed filesystem that won't hiccup when a
client node crashes. I've decided to implement all the gritty work in
the volume manager, and just design the filesystem to take advantage
of it.
Having done my own tests, I've determined that a 5msec timeout to
determine if a host has crashed would be appropriate. (My test was
basically looking at ping time over a saturated network - averaged
about 600 microseconds, topped at 3msec over 10 minutes) I figure that
5msec timeout won't add any noticeable lag to the volume manager, as
most disk seek times are in that range.
Anyway, we basically have the volume manager asking around for
everyone else on the network, to see if they have a particular area
locked. If so, wait for it to be unlocked. If not, record and
broadcast the lock, then do the atomic read/compare/write operation
while locked, and unlock before we return to the client (filesystem).
I need the API to this code to be available to userspace due to an
application that I cannot mention because of an informal NDA.
Thank you,
- Alex Austin, Circuitsoft Computer Services
"From Windows to MacOS, and the Linux in between"
On Mon, 01 May 2006 00:26:05 CDT, Circuitsoft Development said:
> Having done my own tests, I've determined that a 5msec timeout to
> determine if a host has crashed would be appropriate. (My test was
> basically looking at ping time over a saturated network - averaged
> about 600 microseconds, topped at 3msec over 10 minutes) I figure that
> 5msec timeout won't add any noticeable lag to the volume manager, as
> most disk seek times are in that range.W
Note that if you're setting 5ms as your timeout for detecting a *crash*,
and your *ping* takes 3ms, that leaves you a whole whopping 2ms. If you
have 1ms scheduler latency at *each* end (remember - you're in userspace
at both ends, right?) you have approximately 0ms left for the remote end to
actually *do* anything, and for the local end to process the reply.
And if the remote end has to issue a syscall during processing the request,
you're basically screwed.
You need to be adding at least 1 zero to that timeout value.