Cloud Storage File System

Native REST Interface

CSFS is a Linux File system available on Centos, Red Hat and SUSE which translates POSIX file system calls to REST object-based calls for Cloud Storage. REST was originally developed for Amazon S3 Cloud Storage and is rapidly becoming a de-facto standard in both the Cloud Storage and Object Storage Vendors. CSFS back-end technology communicates to Cloud Storage Providers over a REST interface. For example, standard file system calls to create, read, write, delete files are translated to GET and PUT REST calls.

Advanced I/O

CSFS resides 100% in the Linux kernel and does not FUSE which allows CSFS to includes an advanced accelerated I/O path. Most file systems today still use single threaded I/O, which limits access to and from the disk. CSFS solves this issue by adding Asynchronous Processing to the Linux environment. As your data is being processed, CSFS will consolidate your storage into large blocks. As these large blocks are completed, they will be sent off to Cloud Storage in the background. This caching replaces the local Linux buffer cache and greatly enhances access speeds. When link speeds permit, single file writes can easily be done at 500 MB/sec or more over Windows and NFS.

Single Name Space

A Single Name Space has been on many IT Professionals wish list for a number of years. CSFS now allows customers to create a Global File View for all of their Cloud Storage Objects. This is done by CSFS separating file system metadata from the physical data while keeping the two combined as a single object. This significant architectural achievement allows metadata to be stored in one location while physical data is maintained in Cloud Storage. For Disaster Recovery purposes, the metadata is also saved in the Cloud Storage. This global view caching server is typically maintained in separate local clustered VM environment. In this way, CSFS exposes all files and directories locally for a quick view of the files without accessing the actual Objects.

Single File Instance

Single File Instance allows for a file that is duplicated may times to only be saved once in the Cloud Storage. CSFS uses a block-level data deduplication algorithm to detect duplicate files. In CSFS “block level” deduplication are blocks of data that are “fingerprinted” using a hashing algorithm (SHA-1) which produces a unique, “shorthand” identifier for each data block. These unique fingerprints along with the blocks of data that produced them are indexed, optionally compressed and encrypted and then retained. Duplicate copies of data that have previously been fingerprinted are deduplicated, leaving only a single instance of each unique data block along with its corresponding fingerprint. Once the block fingerprint value have been calculated, the deduplication engine must compare the fingerprint against all the other fingerprints that have previously been generated to see whether this block is unique (new) or has been processed previously (a duplicate). It is the speed at which these index search and update operations are performed that is at the heart of a deduplication system’s throughput.

Compression

Data compression re-encodes data so that it occupies less physical storage space. Data compression algorithms search for repeatable patterns of binary 0s and 1s within data structures and replace them with patterns that are shorter in length. The more repeatable patterns found by the compression algorithm, the more the data is compressed. Compression algorithms adapt dynamically to different types of data in order to optimize their effectiveness. Because data compression is based on the contents of the data stream, the algorithm is designed to adapt dynamically to different types of data. The operations performed by the algorithm that produced the compression are reversed to “decompress” compressed data. The effectiveness of any data compression method varies depending on the characteristics of the data being compressed. CSFS optionally utilizes data compression to save space within Cloud Storage. Data deduplication and compression are critical when sending data over low speed lines. Data may be sent in half the time if compression and deduplication result in an overall data reduction ratio of 50%.

Cloud Storage File System