Monday, December 23, 2013

What do you lose by moving to distributed version control from Subversion? When is using Subversion the right choice?

This is a follow-up post with a few counterpoints to the last post about distributed version control.  There are a few things that I can see that Subversion does better than either GIT or Mercurial. In the interest of fairness, I will point them out here:

1.  It appears that Subversion, given adequate server hardware, and adequate local area network bandwidth, may be a better solution for teams who need to do checkouts more than 2 gigabytes in size, especially if you simply must check in binaries and version control them. Your mileage may vary, in your situation, you may find that the point of inflection is 750 megabytes.    A common example of this is video-game development, where assets easily hit 10-20 gigabytes, and are commonly versioned with Subversion.

2. Subversion has support for partial and  sparse checkouts, something that you don't get with distributed version controls systems,  and all the attempts to add sparse checkouts to DVCS have been so flawed that I would not use them.  The nearest relevant and useful DVCS equivalent  is submodules.  Most users who need to do partial checkouts in subversion will find that they want to investigate using submodules in a DVCS.  If submodules do not meet your needs, then maybe CVCS is best for your team. If you need different users to have different subsets of the repo, using scatter/gather workflows, or otherwise do sparse checkouts (svn co http://server/trunk/path/to/sub/module3 rather than being forced in Git or mercurial to do a clone which is equivalent roughly to svn co http://server/trunk/ ) you may find Subversion meets your needs better.  It is a common rookie mistake  to conflate DVCS repo scope with CVCS repo scope.   DVCS repos are typically simpler and smaller intentionally, rather than the subversion "this server is your whole code-universe" monster-mega-repo strategy that Subversion limits you to. 

3.  Subversion has global version commit numbering, that is your ONE and only Subversion server has a commit counter, and since this global asset is shared among everybody, you can never have a "commit #3" on one computer be anything other than "commit #3" on anyone else's computers. On Git and Mercurial the system generates globally unique hash tags instead to identify commits, and the commit numbers, if available, should generally just be ignored as they are different for each user.  For some workflows you might find this global commit numbering system suits you better than referring to the hex strings of 8 to 24 characters that identify your commits, which have no ordering.

If I've missed anything, I'll add it in an update. Anyways, those are the three golden reasons that I know of that some teams will want to evaluate DVCS, and then stick right where they are with Subversion, which by the way, does seem to me to be the best open-source CVCS out there right now.  I only wish there was a better GUI than Tortoise, and that they'd do a little work to make command line merging less stupid.

Update: Someone was confused about why you would want users to "generate" hash keys.  This means I didn't explain it properly. The version control system generates hash keys, and "hashing" means feeding the input of your changeset through a cryptographic hashing function. The possibility of a hash collision is so low, that you will never encounter one.  Git and Mercurial both use them, and I have never heard of a hash collision, ever.  My only reason for mentioning it is that in a distributed system there is no single incrementing counter available to give you unique incrementing numbers. Not a big deal, but something to know before you leap.  More info here. 

Update 2:  Today I spent some time fighting Subversion working copy folder corruption. Issues like this one that were a big problem in 2008 and 2010 are still a big problem in 2014.  That's bad news for Subversion users.

Update 3: The big thing you'll lose when you leave subversion behind is all the bugs and the missing features. Subversion, and TortoiseSVN are giant piles of bad design, festering technical debt, and the parts that work bug free still have glaring functional deficiencies.  I don't think I'd miss Subversion one bit if I could move the projects that use it to something else, I would.

Update 4 (2016): Subversion is junk. Full of client and server side bugs.  I take back most of my compliments above, I'm sick of Subversion and want to kill it with fire.  Git can do sparse checkouts and with Git and GitLab you can even Lock files (a bad idea, but technically possible now). So there are zero technical reasons to keep using Subversion.



7 comments:

  1. Imprnt.in also offer a price promise so if you find products here that are cheaper somewhere else, then we promise to do everything to ensure that our prices down. So, what you're looking for when it comes to your CCTV camera and Access Control System , you are bound to find the perfect product to suit your needs with us.

    ReplyDelete
  2. "On Git and Mercurial we use hash tags instead to identify commits, and the commit numbers, if available, should generally just be ignored as they are different for each user."

    Could you give an example of this, please? How do you choose a hashtag and guarantee it's unique?

    Or, another way of asking the same question: how do you reliably uniquely identify a specific revision, in order to, say, identify a build? In SVN we use the revision number to identify the build, as a build number.

    ReplyDelete
  3. You don't choose one, the version control system creates and assigns these values, and checks for collisions. Collisions are very rare (almost non-existant) and even if they do occur, the version control system works around those cases.

    ReplyDelete
  4. Worth to notion is that subversion does cherry-pick tracking.
    You can ask subversion which revisions you already merged from a certain branch.
    You could ask e.g. git, but it only matches commits with the same patch-id (which causes the same change).
    Where I work, we use this extensively and it would cause trouble when we move to a dvcs

    ReplyDelete
  5. 1) Neither one nor the other. I wouldn't put something more than 1MB on SVN, it is just sooooo SLOW. Git seems faster but no progress, no resume :(
    I would rather say keep your fat-ass assets on FTP+SSH or VPN+rsync

    2) no idea what type of animal is that and whether I really want to know :) or left alone taint it.

    3) this indeed a point bugging me to. every change whether on trunk or branch has unique incremented number. maybe in git you can achieve the same using commit count? I think you can cook it yourself, but one cannot expect to have only one true solution in the case of DVCSs.

    ReplyDelete
  6. With Mercurial, you can get SVN-like behavior for large binary files. The bundled largefiles extension (http://mercurial.selenic.com/wiki/LargefilesExtension) allow you to mark files as "large". Mercurial will then not pull down the entire history for these files — you only get the version needed for the working copy. When you "hg update", Mercurial will fetch the files from the server as needed. That feature was pushed forward by Unity, a game development company.

    Apart from that, I think you mention some good points!

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete