cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: curl-commits mailing list usefulness, and some git rambling

From: Yang Tse <yangsita_at_gmail.com>
Date: Fri, 26 Mar 2010 12:52:43 +0100

2010/3/26, Jamie Lokier wrote:

> Yang Tse wrote:
> > content is provided in files and files have a history and as
> > such a revision number.
>
> Git history is not linear, and is updated independently at different
> locations without having to be online all the time, so a global
> incrementing version number is fundamentally impossible.

Given a single repo with more than one branch a given file might have
a different life on each branch. The file could perfectly have a
revision number which included branch Id for each branch without much
problem. The fun starts when two branches are merged, and how
conceptually the merging should be considered.

One option would be to consider there exists a donor branch and a
recipient branch. In this case the merged file would continue with
recipient branch Id and next revision number for that branch.

Another option would be to consider that both branches are donors and
that the merging 'creates' a third unified branch. In this case the
branch Id would be a new one and the revision number part would start
over for the merged file.

With disconnected repos it would be something similar, but a repo Id
would also come into the 'file revision number'. Implying that the
revision number of a given file would be a three part thingy. 'repo_id
. branch_id . rev_id'.

Whether those designing a VCS wish to use the above or not is a
different matter. But it should be doable.

As a matter of fact commit time-stamps in a DVCS will most probably be
those of the merge point when two repos are merged and not the
time-stamp of the commit that took place when in disconnected state.
It is a similar problem.

The big falacy of nearly all so called 'distributed' VCS is calling
them distributed, they should simply be called mostly-disconnected.
The longer the repos are disconnected without merging contents the
less likely I would call it, as a whole, a VCS. In the end a project
needs an authoritative repository.

> > Another thing that shows that git is yet in its infancy is the use of
> > internal keys as external references. GUID's are fine as primary keys
> > in a database,
>
> They aren't internal keys, and they aren't GUIDs!
>
> They are strong crypto hashes of the entire history up to that point.
>
> And yes, they are checked.

OK, my wrong. They are hashes.

> > but only for internal purposes. Exposing these as the primary way of
> > identifying elements, no matter if these are code commits, financial
> > transactions or cake recipes makes little sense, except maybe for
> > debugging purposes.
>
> Good luck arguing that one with git users :-)

Nah, It is simply my opinion mostly for personal consumption. I'm not
involved in git design nor anything related.

It might be valuable for some, at some moments, to have very low level
VCS commands available at their fingertips. But most users simply need
a high level interface which allows to perform basic operations,
without the real risk of providing every committer the capability of
screwing the repository. Obviously if the project implements a
pull-only policy, and the ones doing the pull's have nearly written
themselves the VCS the risk of screwing the authoritative repository
diminishes, anyway, for me, it is a disconnected VCS not distributed.

> What other identifier can you use to identify a point in history
> globally, without requiring every user to be online when they commit
> things locally?
>
> That's a serious question, btw. If you have a good solution I'd be
> interested.

Globally across disconnected and unsynchronized repos, it does not
exist because disconnected and unsynchronized repos are not subject to
have nothing common beyond the point in which they forked. When they
are merged they can share and agree on whatever they want, at merge
point they can agree on whatever global identifier they wish to share.
But as soon as they disconnect again they can not assume nothing about
the other disconnected repo, it might stay unchanged, get wiped out or
even corrupted, each repo has its own life.

> All the VCSs based on things like CVS/SVN two-way replication have a
> worse problem: version numbers increment, but are different at each
> location. When someone tells you to look up version x.y.z, you have
> to use the master server holding x.y.z or you can't look it up. What
> if it's down, corrupt, or you're working offline? Then you're stuck.

Yep, that's the reason that I said that post-commit hooks could be
used as a poor man's replication system. It may work in some setups,
depending much on the ability of those deciding how it is deployed. It
has shortfalls, and the more manual intervention is required to
perform a single merge in a given VCS the more likely it is it will
require additional manual administration in a multi-master setup.
Single-master with multiple replicas should work more smoothly and be
easier to manage.

The fact is that truly distributed VCS is not something widespread
even when it is possible to do. Paxos and Mencius algorithms are the
key to this http://en.wikipedia.org/wiki/Paxos_algorithm some external
links on that wikipedia are _really_ interesting.

VCS related systems I've been able to locate are for example:

Reliable Software's Code Co-op http://www.relisoft.com

WANdisco's CVS-multisite SVN-multisite and JIRA-multisite
http://www.wandisco.com

Cheers,

-- 
-=[Yang]=-
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-03-26