WHY THIS MATTERS IN BRIEF
As our data volumes explode we need new ways to store it all, and DNA is a perfect solution – if we can get it to work.
By all reports the modern world is facing a tsunami of data as we produce exabtyes of the stuff every day, and DNA is emerging as one of, if not the most viable way to store it all and preserve it for generations to come, with new research breaking records more regularly that Usain Bolt. Now researchers supported by Microsoft, who recently spent millions of dollars on customised DNA strands to prove they could store hundreds of petabytes in just one gram of DNA, then followed it up with the creation of the world’s first DNA storage file system, have created the world’s first system that can automatically translate digital information into genetic code and retrieve it again.
In 2018 we created 33 Zettabytes (ZB), or 33 trillion Gigabytes, of data, according to analysts at IDC, and they predict that by 2025 that figure will rise to 175 ZB, and it’s been estimated that if we were to store all our information in flash drives, by 2040 it would require 10 to 100 times the global supply of chip-grade silicon.
DNA, on the other hand, is so compact it could shrink a massive NSA style hyperscale datacenter to the size of a small table – that is if you threw in the compute as well, not just the storage. But for that to become practical we need a DNA based equivalent of a hard drive that lets you upload and download data in a simple and intuitive way. And this is where Microsoft comes in, as well as other companies such as Catalog Technologies, who I’ve discussed before who are also developing DNA storage systems.
Scientists have already demonstrated their ability to store everything from text to videos in DNA, but the process still requires a lot of manual intervention.
“You can’t have a bunch of people running around a data center with pipettes – it’s too prone to human error, it’s too costly and the footprint would be too large,” lead author Chris Takahashi, senior research scientist at the University of Washington (UW), said in a statement.
So the researchers designed a desktop-sized device that carries out the entire process automatically. First, software converts digital data into the four DNA bases – the letters A, T, C, and G – that make up the individual building blocks of the genetic code.
The device then adds the required chemicals to a synthesizer to build the snippet of DNA and then stores it in a special vessel. When it’s time to read the data back out again, microfluidic pumps push the sample into a sequencer, where the genetic code is read before the software converts it back into 1s and 0s.
There are quite a few caveats, though. For a start, they only stored the world “Hello,” which represents just five bytes of data. And in a paper describing the research in Scientific Reports, they say it took 21 hours to write the data and read it back out. Ouch!
The device also costs roughly $10,000, and there’s no discussion of the cost of the precursor materials. For reference, Twist Bioscience, where Microsoft buys its customised DNA strands from, charges between seven and nine cents per base, and you’d likely need thousands or even millions to store a few megabytes of data.
But DNA technology is moving quickly. Sequencing the first human genome cost $2.7 billion and took 15 years, but just 20 years later private companies will do it for under $1,000 in a matter of weeks.
And Microsoft isn’t the only company working on DNA storage. Intel and Micron are also funding research, and last year MIT spinoff Catalog Technologies revealed they are building a machine the size of a bus that will be able to write a Terabyte of data into DNA a day sometime this year.
Their approach promises to be more cost-effective, because rather than directly converting 1s and 0s into specially synthesized DNA strands, they will use enzymes to arrange cheaper pre-made strands into larger molecules whose patterns encode the relevant data, but a lack of detail has made it tricky to assess the viability of the idea, though.
Data storage is also not the only aspect of the digital world where DNA could play a greater role. A day before the UW research was published, scientists at University of California, Davis revealed the first reprogrammable DNA computer in a paper in Nature.
It’s not the first time DNA has been used to carry out computation, but previously the DNA hardware had to be designed specifically for the task at hand. This time the researchers created hundreds of DNA strand building blocks that can be combined to create circuits that implement 21 different algorithms for simple tasks, like generating patterns or counting.
The technology isn’t going to replace silicon computers anytime soon, but could be used to carry out computation at a molecular level. That could involve directing the activities of nanoscale factories that assemble molecules or helping build tiny DNA robots, says Petr Sulc, an assistant professor at Arizona State University.
The UW scientists are also looking at introducing these kinds of computational capabilities into its DNA systems. They have developed processing techniques that allow them to use interactions between the molecules to directly search for data like images in the DNA without converting it back into digital format.
They say their next steps will be to combine their new data storage device with these kinds of capabilities, as well as more advanced methods for mixing liquids that can move single droplets around using a grid of electrodes.