WHY THIS MATTERS IN BRIEF
With the amount of information on the planet exploding and traditional storage technologies reaching their theoretical limits we need a new way to store the torrents of data society is creating, and DNA storage looks increasingly like it’s the answer.
Microsoft, who last year announced that they’d managed to break the world record and store 200Mb of movies and documents on strands of DNA, the details of which were published in a paper published on Biorxiv, have now announced that they’re going to be the world’s first company to offer DNA storage as a service as part of their Azure cloud offering, and that’s on top of their project to turn Azure into the world’s largest supercomputer, or build the world’s first commercially available quantum computer – something that would perfectly complement a DNA storage system. Furthermore, and this is the really exciting part of it, they plan on doing it by the end of the decade – this decade.
“Our aim is to create a proto-commercial system within three years and use it to store some amount of data in one of our Azure data centers, for at least a boutique application,” says Doug Carmean, a partner at Microsoft Research, who went on to explain that the eventual system will likely be the size of a large 1970s era Xerox copier.
Internally though Microsoft is harbouring the even greater ambitions and sources say that one day they want to be able to use DNA storage to replace all their tape archiving systems, which today account for hundreds of petabytes of archived data storage and take up huge amounts of warehouse sized floor space.
“We hope to get it branded as ‘Your Storage with DNA,’” says Carmean.
Microsoft’s latest plan signals just how seriously some technology companies are starting to take the idea of using DNA in their data centers, whether it’s to help them store information, or create the next generation of DNA computing systems, systems that when they’re realised could make even future Quantum Computing platforms, which will be capable of processing information hundreds of millions times faster than today’s logic based computers, look as outdated as the Abacus.
So, what’s behind this new found love for DNA? The reason, says Victor Zhirnov, who’s the Chief Scientist of the Semiconductor Research Corporation, is that efforts, by the likes of HGST, Samsung, Seagate and Western Digital, to shrink computer storage and increase its density are hitting their physical limits, but DNA, on the other hand, can store data at incredible densities – after all, nothing else on the planet that can come close to its ability to store over 1,000 Petabytes, or a 1,000,000,000 Gigabytes, per cubic millimeter.
Putting this into perspective DNA storage would let us store all the world’s movies in a storage device no bigger than a sugar cube, and we could store all the world’s information in a shoebox. And that’s today where scientists are “only” working with DNA that has four nucleotide bases, namely A, G, C, and T, but with new six nucleotide base DNA, that has additional X and Y nucleotides, emerging on the horizon you could arguably store all the world’s information on a device no larger than a grain of sand – and have space to spare.
Awesome!
“Density is driving everything,” says Zhirnov, “DNA is the densest known storage medium in the universe, just based on the laws of physics. That is the reason why people are looking into this, and the problem we are solving is the exponential growth of stored information.”
That said though the act of converting digital bits into DNA code is still laborious and expensive because of the chemical process used to manufacture the customised DNA strands. In its demonstration project, for example, Microsoft used 13,448,372 custom strands made by a company called Twist Bioscience that cost a million dollars, and when you think that that was used to store just 200Mb of data that’s a very expensive storage medium. But as new start ups continue to pile into the space such as DNAScript, Nuclera Nucleics, Evonetix, Molecular Assemblies, Catalog DNA, Helixworks, and a spin off of Oxford Nanopore called Genome Foundry that cost will inevitably start falling.
“The main issue with DNA storage is the cost,” says Yaniv Erlich, a professor at Columbia University, “so the main question is whether Microsoft solved this problem, and based on their publication I didn’t see any progress towards that goal, but maybe they have something in their pipeline.”
According to the team at Microsoft, the cost of DNA storage needs to fall by a factor of at least 10,000 before it becomes viable and widely adopted, and while many experts say that’s unlikely – although I have to disagree here, after all just look at the dramatic fall in the cost of sequencing genomes that fell from $100 million to just $500 in fifteen years – Microsoft believes such advances could occur if the computer industry demands them.
Automating the process of writing data into DNA will also be critical, and again, based on the several weeks that it took to carry out their initial experiment Carmean estimates that the rate of moving data into DNA was only 400 bytes per second – something that again, if DNA storage is to become a practical storage medium, needs to be at least 100 megabytes per second.
One exciting possibility being pursued by some of the start ups trying to speed up the 40 year old chemical process to create custom DNA is to use enzymes instead, in the same way our bodies do, and Jean Bolot, who’s the scientific director of Technicolor Research, in Los Altos, says this is precisely what they’re working on with Harvard University.
“I am confident we will have results to talk about this year,” says Bolot, who adds that his company has been in discussions with movie studios about how they might use DNA storage to store old movies, and even future Virtual Reality (VR) ones.
Reading the data back out of DNA though is easier – originally the team used a high speed sequencing machine and they think that just improving the speed of this machine by a factor of two could be enough for commercial use.
Despite all of this though, because the process of writing and reading data to and from DNA will still be slow for the next number of years at least, it’s inevitable that in the short term, any early use of the technology will be limited to special use cases and for archiving purposes which could, to name but a few, include storing data for legal or regulatory reasons, such as police body camera video, regulatory information and medical records.
Meanwhile Zhirnov goes on to say that computer chip makers are starting to take DNA seriously because there are physical limits to how much data can be stored in conventional media, like tapes or hard drives, and while he and his team first started looking into the potential seriously in 2013 he says semiconductor experts who believed DNA was too “soft” were surprised to learn that it lasts a hundred to a thousand times longer than a silicon device – a good example of this being the fact that you can still read the DNA found in bones of animals, such as Mammoths and flies trapped in Amber that died hundreds of thousands, or in some cases millions, of years ago. Try doing that with a hard drive…
A spokesperson for Microsoft Research said the company could not confirm “specifics on a product plan” at this time but inside the company, the DNA storage idea is apparently gaining traction.
“Our internal people believe us, but not the tape storage people,” says Carmean, formerly a top chip designer at Intel.
In addition to being dense and durable, DNA has a further advantage that’s not often mentioned its extreme relevance to the human species. Think of those old floppy disks you can’t read anymore or clay tablets with indecipherable hieroglyphs. Unlike those types of media, DNA probably won’t ever go out of style.
“We’ll always be reading DNA as long as we are human,” says Carmean, and that’ll be a good thing if we want future generations to be able to see the information we’ve all been busy creating and storing. Like cat gifs. Yeah. Lots of cat videos.