Where Is Your Data Stored? Part 1 on Decentralized Storage

We felt that we should begin this series by talking about data because it is the foundation of everything we do online. Understanding where data is stored, and by whom, is an important first step in understanding the power dynamics at play in the world of data — a world in which we all operate by simply existing in modern society.

Storage is a topic many turn to merely out of necessity, paying to upgrade their storage availability when at capacity. Most data is stored in the cloud in the form of common “trusted third-parties” (TTPs), such as Google Drive, Box, Dropbox, or iCloud. But decisions around where to store data should ultimately come down to more than necessity; they should be about prioritizing secure and reliable data storage and retrieval. 

There are many brilliant pieces written on the impact of data being stored and managed by TTPs, including this thread by Punk6529. It notes that third-party databases store data that impacts many different aspects of our lives. For example, if our hotel reservation gets canceled or if we are able to vote, among other things. Security failures within TTPs, which are completely out of the control of everyday users, can have catastrophic effects on what users can or cannot do in their daily lives.

Despite these concerns, most businesses end up allowing their data to be managed by TTPs. As they scale and their storage needs grow, having access to storage that is reliable and efficient becomes increasingly important. 

Centralized Storage Services (CSS)

The primary factors that led to centralized cloud computing’s rise to prominence include the following benefits:

  • Low or no upfront investment in infrastructure
  • Cloud computing costs are typically lower than in-house IT infrastructure due to economies of scale
  • As needs fluctuate, cloud-based operations allow organizations to rapidly scale up and down, increasing efficiency while saving money
  • Cloud service providers develop, maintain, and update their own system hardware and software, allowing customers to offload both the risks and the costs of operating infrastructure
  • Data stored in the cloud is easily accessible across all devices

Given these factors, the rise to prominence of cloud storage and computing should come as no surprise. That said, these solutions don’t come without downsides and can often come at the expense of security and privacy—with most businesses turning to established centralized Infrastructure-as-Service (IaaS) cloud computing solutions for their storage needs.

The team at Affinidi explores these downsides by outlining several concerns associated with TTPs, from data hacks to the monetization of sensitive information. Six out of ten of the largest data breaches of all time occurred within the last 5 years. Numerous cloud storage providers, traditional brick-and-mortar companies, and large software-based companies have instituted new regulations in response to security breaches and to build consumer trust. These regulations are designed to help protect both consumers and enterprises that store massive amounts of data in the cloud. 

There is also increasing awareness among consumers of Big Tech (e.g. Facebook, Google, AdTech, etc.) using personal data for profit. While these firms argue they are “selling access” as opposed to “selling customer data”, the semantics don’t change the facts: customer information is collected for the sake of targeted advertising. This can play a dangerous role in the rise of political polarization, made clear in 2018 when the Cambridge Analytica scandal exposed the role of social media advertising in the 2016 American Presidential election. Centralized data storage itself is not to blame for this phenomenon, but with centralized storage comes lack of consumer control and ownership of their personal data, and that is the key underlying issue. 

Overall, despite centralized storage systems’ first-player advantage and superior user experience (both from a technical perspective and from a customer service perspective), decentralized storage solutions are becoming increasingly appealing to many. But before we go into the benefits of decentralized storage, let’s make sure we are all on the same page about what exactly decentralized storage is. 

Decentralized Storage Services

A decentralized storage system (DSS) is a peer-to-peer (P2P) cloud storage solution where user-operators rent out disk space on their drives, many of whom are incentivized with tokens in return for their contributions. DSS data is sharded (a fancy way of saying split into many pieces), encrypted, and then distributed over an extensive system of nodes (computers that run the blockchain software) worldwide. Files are encrypted with private keys, and only users with the same set of keys can access the data, thus making it highly secure.

A DSS is defined by who controls and has access to data, much more than it is defined by the location or technical architecture of the data. While your data is already in multiple locations using CSS (data you store is copied multiple times to redundant and fail-safe back-up locations to increase the reliability of the system regardless of who you store it with), it is only considered decentralized in the web3 sense of the term if central entities are not controlling the data.

A DSS offers many benefits over traditional centralized cloud storage solutions. For one, they are censorship resistant and have built-in redundancy, which reduces the risk of downtime when data may be inaccessible. With data encryption built in, the data is not even accessible to the network participants who are storing it. Given that data can only be accessed using a key which contains a hash of the data itself, it is virtually tamper proof. To maliciously access data stored on decentralized storage platforms, malign actors would have to launch concurrent attacks on multiple storage nodes worldwide, which is for all intents and purposes impossible. There are many use cases where censorship resistance is extremely valuable. One can think for instance of documenting human rights abuses and ensuring key pieces of evidence won’t disappear. A less extreme example would be tracking supply chain movements to ensure a company’s logistical procurement details line up with what they claim to be true.  

DSS also has higher liveliness (i.e., uptime) than CSS providers. As long as a DSS platform is running, your data can be accessed on request. Filecoin specifically uses proof-of-replication (PoRep) and proof-of-spacetime (PoSpacetime) protocols to verify nodes are holding copies of stored files as agreed. On DSS, data is immutable, meaning it cannot be tampered with, which removes the need for trust between parties and providers. 

Another advantage of DSS is its lower cost. While storing data directly on blockchains would be prohibitively expensive, cryptographically hashing data in IPFS and then storing it with a Filecoin miner is far less expensive. According to this article, storing data using IPFS costs less than 1% as much as storing the same amount of data on Amazon Web Services. Because decentralized storage can take advantage of unused storage capacity, it can afford to offer DSS at a fraction of the cost of CSS given the lack of overhead, according to this Storj article.

While there are some advantages to decentralized storage, it’s important to acknowledge that there remains a long road ahead in terms of making it more accessible to the average user. There are also unresolved questions around issues of liability, regulations, and more. Limited answers around liability are a lingering disadvantage to the rise of decentralized storage solutions. 

Here are a few things we believe still need to be worked out in order for DSS to achieve its fullest potential:

  • Faster data retrieval speeds: A supposed benefit of decentralized storage is faster retrieval of information. In centralized data models, a user will make a request via a URL, and the server will deliver the file using HTTP. In P2P storage model systems, files are assigned a cryptographic hash. When the user requests the data, the network searches for an exact match of the hash. Ideally, this process should be almost instantaneous. In reality, the network searches can take a long time and would need to be faster to draw more attention from prospective users.
  • Focus on sustainability: Another supposed benefit of decentralized storage is that it can be more sustainable and use less energy than centralized storage because it can use existing unused storage capacity on recycled hardware around the world. In reality, there is no requirement for DSS to use recycled hardware, and most data providers prioritize energy intensive data centers that operate just like CSS data centers, over unused storage spaces. DSS providers must also account for overhead needs related to the running of a decentralized network, including paying potentially thousands of participants, which isn’t required in CSS.

Overall, we would be getting ahead of ourselves if we claimed that decentralized storage solutions are a one-size-fits-all panacea for safely storing the ever growing amounts of data necessitated by an increasingly digital world. While there are many aspects of decentralized storage that need to be improved, the underlying technology is extremely promising. In the next article, we will cover who some of the main players are in the decentralized storage landscape. 

In the meantime, if you’re interested in learning more, here are a few resources to check out:

Where is Filecoin Heading? by FoxWallet Official

Decentralized Storage, by Wackerow

The Pros and Cons of Decentralized Storage, by Susnigdha Tripathy

The benefits of storing data in a decentralized cloud, by Martin Bolima

Decentralized storage: The future of data storage? by Pixelplex