Erasure Coding
Read first about our Data Security Framework here!
Overview
Akave's Erasure coding is a data protection method that splits data into fragments and distributes them across multiple storage nodes. By adding parity fragments, Akave ensures that data can be reconstructed even if some fragments are lost or corrupted. This approach combines data resilience with efficient storage utilization.
Key Components
Data Splitting
Original files are divided into chunks, typically 32 MB in size.
Each chunk is further subdivided into smaller data blocks, with sizes up to 1 MB, for processing and distribution.
Reed-Solomon Algorithm
Akave employs the Reed-Solomon error correction algorithm to generate parity blocks for each data block.
Parity blocks provide redundancy, enabling data recovery even if a subset of the original and parity blocks is unavailable.
Network Distribution
Data and parity blocks are distributed across a network of storage nodes.
Nodes are geographically dispersed to enhance reliability and reduce the risk of data loss from localized failures.
In addition, each storage node replicates the data blocks to the Filecoin network. This ensures an additional layer of resiliency and decentralization by leveraging Filecoin’s distributed storage capabilities, providing further protection against data loss or corruption.
Data Sampling and Mini-Proving
Akave performs periodic data sampling and mini-proving on stored data blocks to verify their integrity.
If a storage node becomes unavailable, these mechanisms identify missing or corrupted data blocks.
The system then triggers a process to rebuild the missing data blocks using the Reed-Solomon algorithm, ensuring uninterrupted data availability.
Process Flow
Data Chunking
A file is split into fixed-size chunks, which are further divided into smaller data blocks.
Parity Generation
Each set of data blocks is processed using the Reed-Solomon algorithm to produce parity blocks.
Example: For every x data blocks, n parity blocks are generated, achieving a x+n-block total.
Distribution
Data and parity blocks are distributed across multiple storage nodes in the network.
Each node also replicates the data blocks to the Filecoin network for enhanced reliability and durability.
Recovery
During retrieval, available data blocks and parity blocks are collected.
The Reed-Solomon algorithm reconstructs missing data blocks, ensuring data integrity.
Data sampling and mini-proving mechanisms help proactively identify and rebuild missing data blocks before they impact data availability.
Security and Reliability Contributions
Resilience: Data can be recovered even if several storage nodes are unavailable.
Efficiency: Minimizes storage overhead while providing strong fault tolerance.
Scalability: Easily adapts to increasing data volumes and node numbers.
Enhanced Reliability: Filecoin replication provides a secondary layer of protection against data loss, ensuring long-term availability and security.
Proactive Integrity Verification: Data sampling and mini-proving ensure that data integrity issues are identified and resolved promptly.
Last updated