Filecoin is expected to raise millions in an initial coin offering.

A blockchain-based cloud storage technology called Filecoin has already raised $52 million from investors. The company is poised to raise millions more on Thursday when it begins selling units of its bitcoin-like cryptocurrency to a larger set of wealthy investors.

Filecoin aims to disrupt conventional cloud-based storage platforms from Amazon and others. If it succeeds, the technology could be worth billions of dollars. But the company will need to overcome some significant hurdles first.

First and foremost, Filecoin's technology doesn't actually exist yet. The Filecoin team has done extensive research and planning, producing a series of white papers describing the technology it's building. But an actual, working Filecoin network is still months away. When it launches, Filecoin will compete with rival blockchain storage networks, including Sia, which has been available to the public for two years.

"Filecoin currently is just a white paper," Sia co-founder David Vorick told us earlier this week.

The broader challenge for Filecoin and its more established competitors will be convincing customers that it's safe to entrust their data to a decentralized, blockchain-based storage network at all. In theory, blockchain-based storage could offer significant advantages, including lower costs and higher reliability. The technology is likely to be most appealing for people looking for low-cost, long-term data storage.

But the technology is going to need at least a few years to mature to the point where it's ready for mainstream use. The Sia network has relatively limited capacity, and, right now, using the technology involves significant hassles—including acquiring the Siacoin cryptocurrency on a digital currency exchange and configuring and running complex Sia client software.

How to use a blockchain to create decentralized storage

Blockchain storage networks aim to enable trustless markets for online storage, allowing customers to buy storage from relatively unknown vendors without having to worry about losing data.

The basic strategy is for a service provider to sign a contract promising to store the data and post collateral backing up the promise. If a service provider fails to keep its end of the bargain, it forfeits the collateral. The idea is reasonable in theory, but it's not really practical with conventional payment networks backed up by the conventional legal system. The system would easily get bogged down in costly disputes between service providers and their disgruntled customers.

But a blockchain provides an elegant solution. Here's how it works: when a storage contract begins, the service provider posts the root of a data structure called a Merkle tree that serves as a unique fingerprint for the customer's data.

This hierarchical data structure allows the service provider to provide a succinct cryptographic proof that it has any particular 64-byte chunk of the file. At regular intervals, the Sia network chooses one of the 64-byte chunks using a pseudorandom function based on the most recent block of the Sia blockchain. The service provider must respond by publishing the sequence of hashes that charts a path up the tree from that data block to the already-published Merkle tree root. This constitutes a cryptographic proof that the service provider still has that chunk of data stored on its servers.

A dishonest service provider can't predict or control which chunk of data will be chosen in each round of the challenge, so the only way to consistently respond to the challenges is to store the entire file. Service providers who fail to supply a proof too many times lose their collateral. The network can enforce these rules without help from the client because you only need to know the Merkle tree's root hash to verify the correctness of a proof.

Rewards, penalties, and redundancy

Rewards and penalties on the Sia network are denominated in Siacoins, the cryptocurrency that powers the Sia network. Users buy Siacoins on an exchange, then spend them to purchase storage from service providers. Service providers post collateral in Siacoins and automatically get them back if they fulfill their contracts and furnish the required cryptographic proof to the Sia blockchain. As on the Bitcoin network, the Sia network is run by miners who earn new Siacoins as a reward for participating in the network's transaction-clearing process to build the Sia blockchain.

Of course, some providers will default on their commitments anyway, but the customer can deal with this by storing redundant copies of the data with different providers. A naďve approach would be to store, say, five copies of each file with five different hosts. A technique called erasure coding, which splits a file up into multiple chunks and allows any chunk to be reconstructed from others, allows clients to do much better than that.

Sia cofounder David Vorick tells Ars that most Sia users currently use a redundancy factor of three—meaning that three bits are stored for each bit of the underlying data. But Vorick argues that customers will eventually be able to do much better, achieving very high reliability with a redundancy factor as low as 1.5. For example, a particular file might be split into 60 pieces and stored with 60 different hosts. The customer would be able to recover the file so long as at least 40 of those 60 hosts remain online.

All these details have to be handled on the client side of the network, since the whole point of the system is to avoid having to trust any single service provider. If you want to store data on the Sia network, you'll need to acquire Siacoins from a digital exchange—most likely buying the more widely traded Bitcoins first and then trading those in for Siacoins. Then you'll need to download the Siacoin client software, which has options for creating storage contracts, uploading files, and so forth.

Filecoin aims to be better blockchain storage

Filecoin is based on the same basic idea as Sia, but it aims to make a few significant enhancements. One is a new algorithm for mining.

Mining is the collaborative process for building a blockchain. Sia uses an approach called proof-of-work that was pioneered by Bitcoin. Computers compete to solve a difficult mathematical problem, with the winner getting to add a new block to the blockchain and reward itself with new Siacoins. That extra computation isn't necessary to actually process Bitcoin transactions—it's essentially just make-work to prevent Sybil attacks and secure the network. And the amount of energy consumed by the Bitcoin network has grown steadily along with the price of bitcoins. The Bitcoin network's annual energy consumption measures in the terawatt-hour range.

Filecoin aims to eliminate this waste by making storage, rather than computing power, the basis for influence on the Filecoin network. While Bitcoin and Sia miners stockpile ever more powerful computing hardware, Filecoin miners will amass more and more hard drives—hard drives that can actually be put to work storing user data.

Filecoin also aims to offer self-healing capabilities that Sia lacks. When a host drops off the Sia network and takes part of a client's data with it, it's a good practice for the client to reconstruct the missing data from other copies (using the erasure coding techniques mentioned above), contract with a new host, and upload the reconstructed data. That means that Sia client software needs to log onto the network about once a week to check if any of their data needs this kind of repair.

Filecoin aims to make this unnecessary by offering automatic self-healing capabilities in the network itself. Under the Filecoin protocol, if a host disappears from the network—or fails to prove that it's still storing data it has promised to store—the network will notice and post a contract for a new host to reconstruct and store the missing data.

That's possible because Filecoin uses an encoding scheme that allows anyone to reconstruct missing data. That's different from the Sia network, where the encryption and encoding of the data is done by the client, which means only the client can reconstruct missing data.

Investors paid millions for a cryptocurrency that doesn’t exist yet

On paper, Filecoin's new capabilities sound like big improvements. But that's also Filecoin's biggest challenge at this point: Filecoin only exists on paper. Whereas Sia has been running a public network for two years, the Filecoin network is still months away from launch.

That hasn't deterred people from throwing cash in Filecoin's direction. Last week, Filecoin raised $52 million by selling units of the Filecoin digital currency to a handpicked group of Silicon Valley insiders. Since the Filecoin network hasn't been launched yet, these investors only got an IOU entitling them to a share of Filecoin currency once the network launches.

On Thursday, Filecoin will be holding an initial coin offering open to a wider range of investors. The offering is expected to raise tens, and possibly hundreds, of millions in additional funds for the company.

Unsurprisingly, Sia's Vorick sounded skeptical when we asked him about Filecoin on Tuesday. Vorick argues that it's difficult to design secure blockchain-based networks, which is why he hewed closely to Bitcoin's design in developing the Sia blockchain. And he argues that Filecoin's creators are underestimating the difficulty of implementing the ambitious ideas in their white papers.

"I think they have missed significant portions of the state of the art in threat models that leaves them vulnerable to a whole bunch of attacks," Vorick said.

So far, these kinds of concerns haven't dampened enthusiasm for Filecoin. The cryptocurrency world is going through an "initial coin offering" boom, with new cryptocurrencies raising millions of dollars for their creators. Last month, a cryptocurrency called Tezos set a new record by raising $232 million. Another blockchain-based storage provider, called Storj, says it raised $30 million selling tokens earlier this year.

Filecoin is hoping to get a slice of the ICO boom, and, in a largely unregulated market, the company has been careful to color inside the lines. The offering will be limited to investors wealthy enough to qualify as accredited investors under Securities and Exchange Commission rules. That will hopefully avoid the legal problems the Securities and Exchange Commission identified last month in a memo about the use of cryptocurrencies to raise investment funds.

Why blockchain storage could be a big deal

The larger question here is whether we should expect that any of these blockchain storage technologies—Filecoin, Sia, Storj, or ones not invented yet—could become a significant factor in the cloud storage market. Obviously, it's way too early to say for sure. Right now, Sia is closer to a research prototype than a production-ready enterprise product.

But Vorick has some pretty good arguments for being bullish about the technology over the long term. A big one is cost. Vorick says Sia customers can store data for as little as $2 per terabyte per month, an order of magnitude cheaper than Amazon's S3. (Update: an Ars reader points out that a better comparison might be Amazon's high-latency Glacier service, which charges $4 per month for storage.) Vorick says a big reason for the difference in price is that Sia's decentralized model means that the network as a whole can be very reliable even if individual service providers offer only so-so uptimes.

A conventional cloud storage provider needs to do much better than that, with uptimes of 99.9 percent or better. Few customers would tolerate a service that was down for several minutes every day or several hours every month. And so companies have to spend piles of money on redundant systems, complex engineering, and round-the-clock staffing to ensure they can deliver excellent uptimes.

In contrast, Sia is designed for hosts with uptimes in the 95- to 98-percent range. Data is spread across many servers in a way that allows it to be re-constructed even if a few of them fail. And that means Sia hosts don't have to spend a lot of money on redundant power supplies or 24/7 support. Vorick says that makes running a Sia storage service radically cheaper.

At the same time, the redundancy built into the system should make the system as a whole as reliable as the best cloud storage providers.

But...

Of course, as Vorick was quick to concede, this argument only works if the Sia software itself is perfectly reliable. And that's not a reasonable assumption right now. The software is only two years old and still changing rapidly.

But Vorick argues that it's only a matter of time before the software matures and customers start to trust the technology more. He draws an analogy to conventional cloud computing, which was initially viewed as a dangerous experiment by IT administrators accustomed to managing servers in their own data centers. Over time, however, the cost and convenience of cloud computing became too obvious to ignore. Today, cloud computing has become an industry standard.

Vorick predicts something similar will happen with decentralized, blockchain-based storage. Right now, it's still the domain of hobbyists, with a few thousand users and only around 70TB of user data. But it might not stay that way.

On the other hand, the technology's theoretical advantages might not be enough to move many real-world customers. While Sia's cost advantages are impressive today, the cost gap might narrow as the market improves. There's real value in paying for service from an established company that will make sure everything works properly—especially for IT professionals whose livelihood depends on minimizing problems.