WHY THIS MATTERS IN BRIEF
DNA can store huge volumes of information and it lasts for hundreds of thousands of years making it an ideal storage medium for tomorrow’s technologies.
The majority of the cells in the human body contain the information you need to build a complete person stored in DNA and scientists have been working hard over the past number of decades to try to develop new DNA storage technologies that could harness the incredible density of DNA to store other types of data. So far though, even though the successes are now starting to come thicker and faster than ever before, it’s been slow going. But now, following on from Columbia University’s announcement that they’d managed to store a whopping 215 petabytes of data on just one gram of DNA, and Microsoft’s announcement that they plan to make DNA storage available via the Microsoft Azure cloud platform in 2020, teams from the University of Washington and Microsoft in the US have passed another major milestone, they’ve managed to create a DNA random access system, or in short, the world’s first DNA based file system, and that’s a huge breakthrough.
DNA’s coding sequence is always described by four base pairs, that is unless you take into account this new six base pair DNA alien that scientists created last year, and these are Adenine, Cytosine, Guanine and Thymine which we normally see represented as A, C, T and G. In your cells these bases are read three at a time, and each set of three describes a different amino acid. Put amino acids together and you get a protein. To store something else as DNA, you need to come up with a different encoding scheme, and while there are several ways to achieve the real problem is how you read and retrieve the data in the right order so that it makes sense.
To read the data you’ve encoded in DNA back you first need to chop it up into shorter sequences because there’s no way to read a full, single piece of DNA. As a consequence this means that any future DNA storage system needs to use markers that tell the system where each individual file sequence starts and ends, and then it needs to be able to read that entire sequence in the right order to retrieve the single file you just stored. This is the breakthrough the two teams have made.
The teams have managed to add random access to DNA storage. The researchers did this by designing new sequence markers that can target specific files without accessing the unneeded files.
The key though is finding enough marker sequences to tag all your files and the good news is the team identified thousands that will work. That means you could amplify a specific sequence that identifies the files you want, and just sequence, or in plain English, read, those back. If you want to keep more files than you have markers then you simply have to store them on additional separate pools of DNA.
The other innovative tweak to DNA storage in the new study is the use of something called “Bit-flipping operation,” or strangely enough XOR for short, in long strings of identical bases. DNA sequencing tends to get messy when there are too many repeated bases so the team used XOR to insert a random sequence to break up these long runs and make the data faster to read.
The result is the world’s first functional file system for DNA, and this gets us closer than ever before to one day realising a future where DNA, not hard drives or SSD’s rule the storage world – even if realising that particular vision is still decades away.
That said though there are still issues to contend with such as storage read write speeds but as DNA storage continues to become an increasingly viable storage medium that could let us store all the world’s information in storage device no bigger than a shoe box, and as people look for new and novel storage solutions, including Atomic scale storage, for the forth coming generation of Quantum computers, and even DNA computers, that can process billions of times more information than their current logical computer counterparts, one thing’s certain – this is going to be a very interesting space to watch.