If you are familiar with vcs software such as Subversion, you might think of boar as "version control for large binary files". But keep reading, because there is more to it.
Boar stores snapshots of directory trees in a local or remote repository and provides tools to ensure that your data is consistent and complete. You can keep just some or all of your data checked out for viewing and editing.
The repository has a simple layout to ensure that the data can easily be extracted even if the original software should be unavailable. This simplicity makes boar ideal for data that needs safe long-term storage.
ontwik.com/python/disqus-scaling-the-world%e2%80%99s-largest-django-application/?utm_medium=twitter&utm_source=twitterfeed, posted 2011 by peter in development distributed hosting python scalability storage toread video
Disqus, one of the largest Django applications in the world, will explain how they deal with scaling complexities in a small startup.
There are many benefits to keeping a lightweight stack. At Disqus, keeping the stack thin helps us scale Django to reach over 125 million unique visitors a month with just a small team of engineers. Avoiding complicated software packages until needed reduces unnecessary overhead, and has let us stay nimble, and use new capabilities in Django (i.e., database routing) and other software as they arise. The talk will cover key parts of the architecture and development process at Disqus, including databases (relational and non), queues, automated testing, and continuous deployment.
www.webupd8.org/2011/02/how-to-boot-iso-with-grub2-easy-way.html, posted 2011 by peter in howto linux storage
If you want to try out a new Linux distro, be it the latest Ubuntu 11.04 Natty Narwhal daily ISO or any other (I've only tested it with Ubuntu though!) and don't want to burn a CD each time you want to try a new daily build (and you don't have an USB memory stick around), you can use a cool GRUB 2 feature that lets you boot a live CD ISO directly from your hard disk. You can also use this method to boot varous utilities such as Super Grub Disk, SystemRescueCD, Parted Magic and so on.
Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.
In order to be very fast but at the same time persistent the whole dataset is taken in memory, and from time to time saved on disc asynchronously (semi persistent mode) or alternatively every change is written into an append only file (fully persistent mode). Redis is able to rebuild the append only file in background when it gets too big.
You never develop code without version control, why do you develop your database without it?
LiquiBase is an open source (LGPL), database-independent library for tracking, managing and applying database changes. It is built on a simple premise: All database changes are stored in a human readable yet trackable form and checked into source control.
The article discusses how to setup MySQL Replication between two Amazon EC2 instances. It walks you though setting up replication for an empty database server. Adding a slave to a server already full of data is a different article.
It is assumed that you already know the basics of starting EC2 instances, connecting to them via SSH and editing files in Linux using vi/vim etc. For this tutorial, I am using the Amazon built machine image ami-2b5fba42 which is Fedora 8 base image.
arstechnica.com/business/data-centers/2010/02/-since-the-rise-of.ars, posted 2010 by peter in cloudcomputing scalability storage toread
Since the rise of the Web, SQL-based relational databases have been the dominant structured storage technology behind online applications. The past few years have seen the emergence of the cloud as a compelling environment for online application development, bringing true utility computing into the infrastructure pantheon. But the cloud and SQL do not mix well, and multiple efforts are now underway to offer viable alternatives to the venerable database. In this article, I'll review the forces that have led to this shift, and I'll argue that while relational databases are by no means doomed, they will soon be joined in the cloud, and possibly out-shined by, new non-relational database technologies.
Trouble is, implementing the best scaling practices is not free, and is often overlooked early in a product's lifecycle. Small teams use modern frameworks to quickly develop useful applications, with little need to worry about scale: today you can run a successful application on very little infrastructure... at least, you can up to a point. Past this point lies an uncomfortable middle ground, where small teams face scaling challenges as their system becomes successful, often without the benefit of an ideal design or lots of resources to implement one. This article will lay out some pragmatic advice for getting past this point in the real world of limited foresight and budgets.
Technology has always been used as a memory aid, of course, but in past millenia, scratching on a clay tablet, writing with a fountain pen, and snapping a digital photo have all required an act of will. Humans had to choose what they would remember.
Now, in an age of ever-cheaper storage, the data committed to machine memory requires an act of will to delete. Storage is now so cheap, in fact, that it requires more effort to cull an e-mail inbox or photo gallery than it does to simply hold on to everything.
To get back to a default state of forgetfulness, Mayer-Schönberger offers an intriguing proposal: find simple ways to give data an expiration date.
Over time, though, my initially rosy feelings towards ORMs have begun to sour. I gradually realised I was spending a disproportionate amount of time trying to coax the ORM into doing my bidding - and when I succeeded, the results were often ugly, slow and needlessly opaque. Analysing the performance of some of the more complicated portions of my data access layer was often painful, and I spent cumulative hours poring over generated SQL, trying to figure out what the ORM was doing and why. Usually, improving performance involved side-stepping the ORM altogether. Recently, a particularly gnarly performance issue prompted me to ditch the ORM from a project altogether, with surprisingly pleasant results.