Distributed file system for cloud

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

Users can share computing resources through the Internet thanks to cloud computing which is typically characterized by scalable and elastic resources – such as physical servers, applications and any services that are virtualized and allocated dynamically. Synchronization is required to make sure that all devices are up-to-date.

Distributed file systems enable many big, medium, and small enterprises to store and access their remote data as they do local data, facilitating the use of variable resources.

Overview
History

Today, there are many implementations of distributed file systems. The first file servers were developed by researchers in the 1970s. Sun Microsystem's Network File System became available in the 1980s. Before that, people who wanted to share files used the sneakernet method, physically transporting files on storage media from place to place. Once computer networks started to proliferate, it became obvious that the existing file systems had many limitations and were unsuitable for multi-user environments. Users initially used FTP to share files.[1] FTP first ran on the PDP-10 at the end of 1973. Even with FTP, files needed to be copied from the source computer onto a server and then from the server onto the destination computer. Users were required to know the physical addresses of all computers involved with the file sharing.[2]

Supporting techniques

Modern data centers must support large, heterogenous environments, consisting of large numbers of computers of varying capacities. Cloud computing coordinates the operation of all such systems, with techniques such as data center networking (DCN), the MapReduce framework, which supports data-intensive computing applications in parallel and distributed systems, and virtualization techniques that provide dynamic resource allocation, allowing multiple operating systems to coexist on the same physical server.

Applications

Cloud computing provides large-scale computing thanks to its ability to provide the needed CPU and storage resources to the user with complete transparency. This makes cloud computing particularly suited to support different types of applications that require large-scale distributed processing. This data-intensive computing needs a high performance file system that can share data between virtual machines (VM).[3]

Cloud computing dynamically allocates the needed resources, releasing them once a task is finished, requiring users to pay only for needed services, often via a service-level agreement. Cloud computing and cluster computing paradigms are becoming increasingly important to industrial data processing and scientific applications such as astronomy and physics, which frequently require the availability of large numbers of computers to carry out experiments.[4]

Architectures

Most distributed file systems are built on the client-server architecture, but other, decentralized, solutions exist as well.

upload and download model
Client-server architecture
Remote access model

Network File System (NFS) uses a client-server architecture, which allows sharing files between a number of machines on a network as if they were located locally, providing a standardized view. The NFS protocol allows heterogeneous clients' processes, probably running on different machines and under different operating systems, to access files on a distant server, ignoring the actual location of files. Relying on a single server results in the NFS protocol suffering from potentially low availability and poor scalability. Using multiple servers does not solve the availability problem since each server is working independently.[5] The model of NFS is a remote file service. This model is also called the remote access model, which is in contrast with the upload/download model:

  • Remote access model: Provides transparency, the client has access to a file. He send requests to the remote file (while the file remains on the server).[6]
  • Upload/download model: The client can access the file only locally. It means that the client has to download the file, make modifications, and upload it again, to be used by others' clients.

The file system used by NFS is almost the same as the one used by Unix systems. Files are hierarchically organized into a naming graph in which directories and files are represented by nodes.

Cluster-based architectures

A cluster-based architecture ameliorates some of the issues in client-server architectures, improving the execution of applications in parallel. The technique used here is file-striping: a file is split into multiple chunks, which are "striped" across several storage servers. The goal is to allow access to different parts of a file in parallel. If the application does not benefit from this technique, then it would be more convenient to store different files on different servers. However, when it comes to organizing a distributed file system for large data centers, such as Amazon and Google, that offer services to web clients allowing multiple operations (reading, updating, deleting,...) to a large number of files distributed among a large number of computers, then cluster-based solutions become more beneficial. Note that having a large number of computers may mean more hardware failures.[7] Two of the most widely used distributed file systems (DFS) of this type are the Google File System (GFS) and the Hadoop Distributed File System (HDFS). The file systems of both are implemented by user level processes running on top of a standard operating system (Linux in the case of GFS).[8]

Design principles
Goals

Google File System (GFS) and Hadoop Distributed File System (HDFS) are specifically built for handling batch processing on very large data sets. For that, the following hypotheses must be taken into account:[9]

  • High availability: the cluster can contain thousands of file servers and some of them can be down at any time
  • A server belongs to a rack, a room, a data center, a country, and a continent, in order to precisely identify its geographical location
  • The size of a file can vary from many gigabytes to many terabytes. The file system should be able to support a massive number of files
  • The need to support append operations and allow file contents to be visible even while a file is being written
  • Communication is reliable among working machines: TCP/IP is used with a remote procedure call RPC communication abstraction. TCP allows the client to know almost immediately when there is a problem and a need to make a new connection.[10]
load balancing and rebalancing: Delete file
load balancing and rebalancing: New server
Load balancing

Load balancing is essential for efficient operation in distributed environments. It means distributing work among different servers,[11] fairly, in order to get more work done in the same amount of time and to serve clients faster. In a system containing N chunkservers in a cloud (N being 1000, 10000, or more), where a certain number of files are stored, each file is split into several parts or chunks of fixed size (for example, 64 megabytes), the load of each chunkserver being proportional to the number of chunks hosted by the server.[12] In a load-balanced cloud, resources can be efficiently used while maximizing the performance of MapReduce-based applications.

Load rebalancing

In a cloud computing environment, failure is the norm,[13][14] and chunkservers may be upgraded, replaced, and added to the system. Files can also be dynamically created, deleted, and appended. That leads to load imbalance in a distributed file system, meaning that the file chunks are not distributed equitably between the servers.

Distributed file systems in clouds such as GFS and HDFS rely on central or master servers or nodes (Master for GFS and NameNode for HDFS) to manage the metadata and the load balancing. The master rebalances replicas periodically: data must be moved from one DataNode/chunkserver to another if free space on the first server falls below a certain threshold.[15] However, this centralized approach can become a bottleneck for those master servers, if they become unable to manage a large number of file accesses, as it increases their already heavy loads. The load rebalance problem is NP-hard.[16]

In order to get large number of chunkservers to work in collaboration, and to solve the problem of load balancing in distributed file systems, several approaches have been proposed, such as reallocating file chunks so that the chunks can be distributed as uniformly as possible while reducing the movement cost as much as possible.[12]

Google file system architecture
Google file system
Splitting File
Description

Google, one of the biggest internet companies, has created its own distributed file system, named Google File System (GFS), to meet the rapidly growing demands of Google's data processing needs, and it is used for all cloud services. GFS is a scalable distributed file system for data-intensive applications. It provides fault-tolerant, high-performance data storage a large number of clients accessing it simultaneously.

GFS uses MapReduce, which allows users to create programs and run them on multiple machines without thinking about parallelization and load-balancing issues. GFS architecture is based on having a single master server for multiple chunkservers and multiple clients.[17]

The master server running in dedicated node is responsible for coordinating storage resources and managing files's metadata (the equivalent of, for example, inodes in classical file systems).[9] Each file is split to multiple chunks of 64 megabytes. Each chunk is stored in a chunk server. A chunk is identified by a chunk handle, which is a globally unique 64-bit number that is assigned by the master when the chunk is first created.

The master maintains all of the files's metadata, including file names, directories, and the mapping of files to the list of chunks that contain each file’s data. The metadata is kept in the master server's main memory, along with the mapping of files to chunks. Updates to this data are logged to an operation log on disk. This operation log is replicated onto remote machines. When the log become too large, a checkpoint is made and the main-memory data is stored in a B-tree structure to facilitate mapping back into main memory.[18]

Fault tolerance

To facilitate fault tolerance, each chunk is replicated onto multiple (default, three) chunk servers.[19] A chunk is available on at least one chunk server. The advantage of this scheme is simplicity. The master is responsible for allocating the chunk servers for each chunk and is contacted only for metadata information. For all other data, the client has to interact with the chunk servers.

The master keeps track of where a chunk is located. However, it does not attempt to maintain the chunk locations precisely but only occasionally contacts the chunk servers to see which chunks they have stored.[20] This allows for scalability, and helps prevent bottlenecks due to increased workload.[21]

In GFS, most files are modified by appending new data and not overwriting existing data. Once written, the files are usually only read sequentially rather than randomly, and that makes this DFS the most suitable for scenarios in which many large files are created once but read many times.[22][23]

File processing

When a client wants to write-to/update a file, the master will assign a replica, which will be the primary replica if it is the first modification. The process of writing is composed of two steps:[9]

  • Sending: First, and by far the most important, the client contacts the master to find out which chunk servers hold the data. The client is given a list of replicas identifying the primary and secondary chunk servers. The client then contacts the nearest replica chunk server, and sends the data to it. This server will send the data to the next closest one, which then forwards it to yet another replica, and so on. The data is then propagated and cached in memory but not yet written to a file.
  • Writing: When all the replicas have received the data, the client sends a write request to the primary chunk server, identifying the data that was sent in the sending phase. The primary server will then assign a sequence number to the write operations that it has received, apply the writes to the file in serial-number order, and forward the write requests in that order to the secondaries. Meanwhile, the master is kept out of the loop.

Consequently, we can differentiate two types of flows: the data flow and the control flow. Data flow is associated with the sending phase and control flow is associated to the writing phase. This assures that the primary chunk server takes control of the write order. Note that when the master assigns the write operation to a replica, it increments the chunk version number and informs all of the replicas containing that chunk of the new version number. Chunk version numbers allow for update error-detection, if a replica wasn't updated because its chunk server was down.[24]

Some new Google applications did not work well with the 64-megabyte chunk size. To solve that problem, GFS started, in 2004, to implement the Bigtable approach.[25]

Hadoop distributed file system

HDFS, developed by the Apache Software Foundation, is a distributed file system designed to hold very large amounts of data (terabytes or even petabytes). Its architecture is similar to GFS, i.e. a master/slave architecture. The HDFS is normally installed on a cluster of computers. The design concept of Hadoop is informed by Google's, with Google File System, Google MapReduce and Bigtable, being implemented by Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Base (HBase) respectively.[26] Like GFS, HDFS is suited for scenarios with write-once-read-many file access, and supports file appends and truncates in lieu of random reads and writes to simplify data coherency issues.[27]

An HDFS cluster consists of a single NameNode and several DataNode machines. The NameNode, a master server, manages and maintains the metadata of storage DataNodes in its RAM. DataNodes manage storage attached to the nodes that they run on. NameNode and DataNode are software designed to run on everyday-use machines, which typically run under a GNU/Linux OS. HDFS can be run on any machine that supports Java and therefore can run either a NameNode or the Datanode software.[28]

On an HDFS cluster, a file is split into one or more equal-size blocks, except for the possibility of the last block being smaller. Each block is stored on multiple DataNodes, and each may be replicated on multiple DataNodes to guarantee availability. By default, each block is replicated three times, a process called "Block Level Replication".[29]

The NameNode manages the file system namespace operations such as opening, closing, and renaming files and directories, and regulates file access. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for servicing read and write requests from the file system’s clients, managing the block allocation or deletion, and replicating blocks.[30]

When a client wants to read or write data, it contacts the NameNode and the NameNode checks where the data should be read from or written to. After that, the client has the location of the DataNode and can send read or write requests to it.

The HDFS is typically characterized by its compatibility with data rebalancing schemes. In general, managing the free space on a DataNode is very important. Data must be moved from one DataNode to another, if free space is not adequate; and in the case of creating additional replicas, data should be moved to assure system balance.[29]

Other examples

Distributed file systems can be optimized for different purposes. Some, such as those designed for internet services, including GFS, are optimized for scalability. Other designs for distributed file systems support performance-intensive applications usually executed in parallel.[31] Some examples include: MapR File System (MapR-FS), Ceph-FS, Fraunhofer File System (BeeGFS), Lustre File System, IBM General Parallel File System (GPFS), and Parallel Virtual File System.

MapR-FS is a distributed file system that is the basis of the MapR Converged Platform, with capabilities for distributed file storage, a NoSQL database with multiple APIs, and an integrated message streaming system. MapR-FS is optimized for scalability, performance, reliability, and availability. Its file storage capability is compatible with the Apache Hadoop Distributed File System (HDFS) API but with several design characteristics that distinguish it from HDFS. Among the most notable differences are that MapR-FS is a fully read/write filesystem with metadata for files and directories distributed across the namespace, so there is no NameNode.[32][33][34][35][36]

Ceph-FS is a distributed file system that provides excellent performance and reliability.[37] It answers the challenges of dealing with huge files and directories, coordinating the activity of thousands of disks, providing parallel access to metadata on a massive scale, manipulating both scientific and general-purpose workloads, authenticating and encrypting on a large scale, and increasing or decreasing dynamically due to frequent device decommissioning, device failures, and cluster expansions.[38]

BeeGFS is the high-performance parallel file system from the Fraunhofer Competence Centre for High Performance Computing. The distributed metadata architecture of BeeGFS has been designed to provide the scalability and flexibility needed to run HPC and similar applications with high I/O demands.[39]

Lustre File System has been designed and implemented to deal with the issue of bottlenecks traditionally found in distributed systems. Lustre is characterized by its efficiency, scalability, and redundancy.[40] GPFS was also designed with the goal of removing such bottlenecks.[41]

Communication

High performance of distributed file systems requires efficient communication between computing nodes and fast access to the storage systems. Operations such as open, close, read, write, send, and receive need to be fast, to ensure that performance. For example, each read or write request accesses disk storage, which introduces seek, rotational, and network latencies.[42]

The data communication (send/receive) operations transfer data from the application buffer to the machine kernel, TCP controlling the process and being implemented in the kernel. However, in case of network congestion or errors, TCP may not send the data directly. While transferring data from a buffer in the kernel to the application, the machine does not read the byte stream from the remote machine. In fact, TCP is responsible for buffering the data for the application.[43]

Choosing the buffer-size, for file reading and writing, or file sending and receiving, is done at the application level. The buffer is maintained using a circular linked list.[44] It consists of a set of BufferNodes. Each BufferNode has a DataField. The DataField contains the data and a pointer called NextBufferNode that points to the next BufferNode. To find the current position, two pointers are used: CurrentBufferNode and EndBufferNode, that represent the position in the BufferNode for the last write and read positions. If the BufferNode has no free space, it will send a wait signal to the client to wait until there is available space.[45]

Cloud-based Synchronization of Distributed File System

More and more users have multiple devices with ad hoc connectivity. The data sets replicated on these devices need to be synchronized among an arbitrary number of servers. This is useful for backups and also for offline operation. Indeed, when user network conditions are not good, then the user device will selectively replicate a part of data that will be modified later and off-line. Once the network conditions become good, the device is synchronized.[46] Two approaches exist to tackle the distributed synchronization issue: user-controlled peer-to-peer synchronization and cloud master-replica synchronization.[46]

  • user-controlled peer-to-peer: software such as rsync must be installed in all users' computers that contain their data. The files are synchronized by peer-to-peer synchronization where users must specify network addresses and synchronization parameters, and is thus a manual process.
  • cloud master-replica synchronization: widely used by cloud services, in which a master replica is maintained in the cloud, and all updates and synchronization operations are to this master copy, offering a high level of availability and reliability in case of failures.
Security keys

In cloud computing, the most important security concepts are confidentiality, integrity, and availability ("CIA"). Confidentiality becomes indispensable in order to keep private data from being disclosed. Integrity ensures that data is not corrupted.[47]

Confidentiality

Confidentiality means that data and computation tasks are confidential: neither cloud provider nor other clients can access the client's data. Much research has been done about confidentiality, because it is one of the crucial points that still presents challenges for cloud computing. A lack of trust in the cloud providers is also a related issue.[48] The infrastructure of the cloud must ensure that customers' data will not be accessed by unauthorized parties.

The environment becomes insecure if the service provider can do all of the following:[49]

  • locate the consumer's data in the cloud
  • access and retrieve consumer's data
  • understand the meaning of the data (types of data, functionalities and interfaces of the application and format of the data).

The geographic location of data helps determine privacy and confidentiality. The location of clients should be taken into account. For example, clients in Europe won't be interested in using datacenters located in United States, because that affects the guarantee of the confidentiality of data. In order to deal with that problem, some cloud computing vendors have included the geographic location of the host as a parameter of the service-level agreement made with the customer,[50] allowing users to choose themselves the locations of the servers that will host their data.

Another approach to confidentiality involves data encryption.[51] Otherwise, there will be serious risk of unauthorized use. A variety of solutions exists, such as encrypting only sensitive data,[52] and supporting only some operations, in order to simplify computation.[53] Furthermore, cryptographic techniques and tools as FHE, are used to preserve privacy in the cloud.[47]

Integrity

Integrity in cloud computing implies data integrity as well as computing integrity. Such integrity means that data has to be stored correctly on cloud servers and, in case of failures or incorrect computing, that problems have to be detected.

Data integrity can be affected by malicious events or from administration errors (e.g. during backup and restore, data migration, or changing memberships in P2P systems).[54]

Integrity is easy to achieve using cryptography (typically through message-authentication code, or MACs, on data blocks).[55]

There exist checking mechanisms that effect data integrity. For instance:

  • HAIL (High-Availability and Integrity Layer) is a distributed cryptographic system that allows a set of servers to prove to a client that a stored file is intact and retrievable.[56]
  • Hach PORs (proofs of retrievability for large files)[57] is based on a symmetric cryptographic system, where there is only one verification key that must be stored in a file to improve its integrity. This method serves to encrypt a file F and then generate a random string named "sentinel" that must be added at the end of the encrypted file. The server cannot locate the sentinel, which is impossible differentiate from other blocks, so a small change would indicate whether the file has been changed or not.
  • PDP (provable data possession) checking is a class of efficient and practical methods that provide an efficient way to check data integrity on untrusted servers:
    • PDP:[58] Before storing the data on a server, the client must store, locally, some meta-data. At a later time, and without downloading data, the client is able to ask the server to check that the data has not been falsified. This approach is used for static data.
    • Scalable PDP:[59] This approach is premised upon a symmetric-key, which is more efficient than public-key encryption. It supports some dynamic operations (modification, deletion, and append) but it cannot be used for public verification.
    • Dynamic PDP:[60] This approach extends the PDP model to support several update operations such as append, insert, modify, and delete, which is well suited for intensive computation.
Availability

Availability is generally effected by replication.[61][62] [63][64] Meanwhile, consistency must be guaranteed. However, consistency and availability cannot be achieved at the same time; each is prioritized at some sacrifice of the other. A balance must be struck.[65]

Data must have an identity to be accessible. For instance, Skute [61] is a mechanism based on key/value storage that allows dynamic data allocation in an efficient way. Each server must be identified by a label in the form continent-country-datacenter-room-rack-server. The server can reference multiple virtual nodes, with each node having a selection of data (or multiple partitions of multiple data). Each piece of data is identified by a key space which is generated by a one-way cryptographic hash function (e.g. MD5) and is localised by the hash function value of this key. The key space may be partitioned into multiple partitions with each partition referring to a piece of data. To perform replication, virtual nodes must be replicated and referenced by other servers. To maximize data durability and data availability, the replicas must be placed on different servers and every server should be in a different geographical location, because data availability increases with geographical diversity. The process of replication includes an evaluation of space availability, which must be above a certain minimum thresh-hold on each chunk server. Otherwise, data are replicated to another chunk server. Each partition, i, has an availability value represented by the following formula:

a v a i l i = i = 0 | s i | j = i + 1 | s i | c o n f i . c o n f j . d i v e r s i t y ( s i , s j ) {\displaystyle avail_{i}=\sum _{i=0}^{|s_{i}|}\sum _{j=i+1}^{|s_{i}|}conf_{i}.conf_{j}.diversity(s_{i},s_{j})}

where s i {\displaystyle s_{i}} are the servers hosting the replicas, c o n f i {\displaystyle conf_{i}} and c o n f j {\displaystyle conf_{j}} are the confidence of servers i {\displaystyle _{i}} and j {\displaystyle _{j}} (relying on technical factors such as hardware components and non-technical ones like the economic and political situation of a country) and the diversity is the geographical distance between s i {\displaystyle s_{i}} and s j {\displaystyle s_{j}} .[66]

Replication is a great solution to ensure data availability, but it costs too much in terms of memory space.[67] DiskReduce[67] is a modified version of HDFS that's based on RAID technology (RAID-5 and RAID-6) and allows asynchronous encoding of replicated data. Indeed, there is a background process which looks for widely replicated data and deletes extra copies after encoding it. Another approach is to replace replication with erasure coding.[68] In addition, to ensure data availability there are many approaches that allow for data recovery. In fact, data must be coded, and if it is lost, it can be recovered from fragments which were constructed during the coding phase.[69] Some other approaches that apply different mechanisms to guarantee availability are: Reed-Solomon code of Microsoft Azure and RaidNode for HDFS. Also Google is still working on a new approach based on an erasure-coding mechanism.[70]

There is no RAID implementation for cloud storage.[68]

Economic aspects

The cloud computing economy is growing rapidly. The US government has decided to spend 40% of its compound annual growth rate (CAGR), expected to be 7 billion dollars by 2015.[71]

More and more companies have been utilizing cloud computing to manage the massive amount of data and to overcome the lack of storage capacity, and because it enables them to use such resources as a service, ensuring that their computing needs will be met without having to invest in infrastructure (Pay-as-you-go model).[72]

Every application provider has to periodically pay the cost of each server where replicas of data are stored. The cost of a server is determined by the quality of the hardware, the storage capacities, and its query-processing and communication overhead.[73] Cloud computing allows providers to scale their services according to client demands.

The pay-as-you-go model has also eased the burden on startup companies that wish to benefit from compute-intensive business. Cloud computing also offers an opportunity to many third-world countries that wouldn't have such computing resources otherwise. Cloud computing can lower IT barriers to innovation.[74]

Despite the wide utilization of cloud computing, efficient sharing of large volumes of data in an untrusted cloud is still a challenge.

References
  1. Sun microsystem, p. 1
  2. Fabio Kon, p. 1
  3. Kobayashi et al. 2011, p. 1
  4. Angabini et al. 2011, p. 1
  5. Di Sano et al. 2012, p. 2
  6. Andrew & Maarten 2006, p. 492
  7. Andrew & Maarten 2006, p. 496
  8. Humbetov 2012, p. 2
  9. Krzyzanowski 2012, p. 2
  10. Pavel Bžoch, p. 7
  11. Kai et al. 2013, p. 23
  12. Hsiao et al. 2013, p. 2
  13. Hsiao et al. 2013, p. 952
  14. Ghemawat, Gobioff & Leung 2003, p. 1
  15. Ghemawat, Gobioff & Leung 2003, p. 8
  16. Hsiao et al. 2013, p. 953
  17. Di Sano et al. 2012, pp. 1–2
  18. Krzyzanowski 2012, p. 4
  19. Di Sano et al. 2012, p. 2
  20. Andrew & Maarten 2006, p. 497
  21. Humbetov 2012, p. 3
  22. Humbetov 2012, p. 5
  23. Andrew & Maarten 2006, p. 498
  24. Krzyzanowski 2012, p. 5
  25. [1]
  26. Fan-Hsun et al. 2012, p. 2
  27. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Assumptions_and_Goals
  28. Azzedin 2013, p. 2
  29. Adamov 2012, p. 2
  30. Yee & Thu Naing 2011, p. 122
  31. Soares et al. 2013, p. 158
  32. Perez, Nicolas. "How MapR improves our productivity and simplifies our design". Medium. Medium. Retrieved June 21, 2016.
  33. Woodie, Alex. "From Hadoop to Zeta: Inside MapR's Convergence Conversion". Datanami. Tabor Communications Inc. Retrieved June 21, 2016.
  34. Brennan, Bob. "Flash Memory Summit". youtube. Samsung. Retrieved June 21, 2016.
  35. Srivas, MC. "MapR File System". Hadoop Summit 2011. Hortonworks. Retrieved June 21, 2016.
  36. Dunning, Ted; Friedman, Ellen (January 2015). "Chapter 3: Understanding the MapR Distribution for Apache Hadoop". Real World Hadoop (First ed.). Sebastopol, CA: O'Reilly Media, Inc. pp. 23–28. ISBN 978-1-4919-2395-5. Retrieved June 21, 2016.
  37. Weil et al. 2006, p. 307
  38. Maltzahn et al. 2010, p. 39
  39. Jacobi & Lingemann, p. 10
  40. Schwan Philip 2003, p. 401
  41. Jones, Koniges & Yates 2000, p. 1
  42. Upadhyaya et al. 2008, p. 400
  43. Upadhyaya et al. 2008, p. 403
  44. Upadhyaya et al. 2008, p. 401
  45. Upadhyaya et al. 2008, p. 402
  46. Uppoor, Flouris & Bilas 2010, p. 1
  47. Zhifeng & Yang 2013, p. 854
  48. Zhifeng & Yang 2013, pp. 845–846
  49. Yau & An 2010, p. 353
  50. Vecchiola, Pandey & Buyya 2009, p. 14
  51. Yau & An 2010, p. 352
  52. Miranda & Siani 2009
  53. Naehrig & Lauter 2013
  54. Zhifeng & Yang 2013, p. 5
  55. Juels & Oprea 2013, p. 4
  56. Bowers, Juels & Oprea 2009
  57. Juels & S. Kaliski 2007, p. 2
  58. Ateniese et al.
  59. Ateniese et al. 2008, pp. 5, 9
  60. Erway et al. 2009, p. 2
  61. Bonvin, Papaioannou & Aberer 2009, p. 206
  62. Cuong et al. 2012, p. 5
  63. A., A. & P. 2011, p. 3
  64. Qian, D. & T. 2011, p. 3
  65. Vogels 2009, p. 2
  66. Bonvin, Papaioannou & Aberer 2009, p. 208
  67. Carnegie et al. 2009, p. 1
  68. Wang et al. 2012, p. 1
  69. Abu-Libdeh, Princehouse & Weatherspoon 2010, p. 2
  70. Wang et al. 2012, p. 9
  71. Lori M. Kaufman 2009, p. 2
  72. Angabini et al. 2011, p. 1
  73. Bonvin, Papaioannou & Aberer 2009, p. 3
  74. Marston et al. 2011, p. 3
Bibliography
  1. Architecture, structure, and design:
    • Zhang, Qi-fei; Pan, Xue-zeng; Shen, Yan; Li, Wen-juan (2012). "A Novel Scalable Architecture of Cloud Storage System for Small Files Based on P2P". 2012 IEEE International Conference on Cluster Computing Workshops. Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China. p. 41. doi:10.1109/ClusterW.2012.27. ISBN 978-0-7695-4844-9.
    • Azzedin, Farag (2013). "Towards a scalable HDFS architecture". 2013 International Conference on Collaboration Technologies and Systems (CTS). Information and Computer Science Department King Fahd University of Petroleum and Minerals. pp. 155–161. doi:10.1109/CTS.2013.6567222. ISBN 978-1-4673-6404-1.
    • Krzyzanowski, Paul (2012). "Distributed File Systems" (PDF).
    • Kobayashi, K; Mikami, S; Kimura, H; Tatebe, O (2011). The Gfarm File System on Compute Clouds. Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on. Grad. Sch. of Syst. & Inf. Eng., Univ. of Tsukuba, Tsukuba, Japan. doi:10.1109/IPDPS.2011.255.
    • Humbetov, Shamil (2012). "Data-intensive computing with map-reduce and hadoop". 2012 6th International Conference on Application of Information and Communication Technologies (AICT). Department of Computer Engineering Qafqaz University Baku, Azerbaijan. pp. 1–5. doi:10.1109/ICAICT.2012.6398489. ISBN 978-1-4673-1740-5.
    • Hsiao, Hung-Chang; Chung, Hsueh-Yi; Shen, Haiying; Chao, Yu-Chang (2013). National Cheng Kung University, Tainan. "Load Rebalancing for Distributed File Systems in Clouds". Parallel and Distributed Systems, IEEE Transactions on. 24 (5): 951–962. doi:10.1109/TPDS.2012.196.
    • Kai, Fan; Dayang, Zhang; Hui, Li; Yintang, Yang (2013). "An Adaptive Feedback Load Balancing Algorithm in HDFS". 2013 5th International Conference on Intelligent Networking and Collaborative Systems. State Key Lab. of Integrated Service Networks, Xidian Univ., Xi'an, China. pp. 23–29. doi:10.1109/INCoS.2013.14. ISBN 978-0-7695-4988-0.
    • Upadhyaya, B; Azimov, F; Doan, T.T; Choi, Eunmi; Kim, Sangbum; Kim, Pilsung (2008). "Distributed File System: Efficiency Experiments for Data Access and Communication". 2008 Fourth International Conference on Networked Computing and Advanced Information Management. Sch. of Bus. IT, Kookmin Univ., Seoul. pp. 400–405. doi:10.1109/NCM.2008.164. ISBN 978-0-7695-3322-3.
    • Soares, Tiago S.; Dantas†, M.A.R; de Macedo, Douglas D.J.; Bauer, Michael A (2013). "A Data Management in a Private Cloud Storage Environment Utilizing High Performance Distributed File Systems". 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises. nf. & Statistic Dept. (INE), Fed. Univ. of Santa Catarina (UFSC), Florianopolis, Brazil. pp. 158–163. doi:10.1109/WETICE.2013.12. ISBN 978-1-4799-0405-1.
    • Adamov, Abzetdin (2012). "Distributed file system as a basis of data-intensive computing". 2012 6th International Conference on Application of Information and Communication Technologies (AICT). Comput. Eng. Dept., Qafqaz Univ., Baku, Azerbaijan. pp. 1–3. doi:10.1109/ICAICT.2012.6398484. ISBN 978-1-4673-1740-5.
    • Schwan Philip (2003). Cluster File Systems, Inc. "Lustre: Building a File System for 1,000-node Clusters" (PDF). Proceedings of the 2003 Linux Symposium: 400–407.
    • Jones, Terry; Koniges, Alice; Yates, R. Kim (2000). Lawrence Livermore National Laboratory. "Performance of the IBM General Parallel File System" (PDF). Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International.
    • Weil, Sage A.; Brandt, Scott A.; Miller, Ethan L.; Long, Darrell D. E. (2006). "Ceph: A Scalable, High-Performance Distributed File System" (PDF). University of California, Santa Cruz.
    • Maltzahn, Carlos; Molina-Estolano, Esteban; Khurana, Amandeep; Nelson, Alex J.; Brandt, Scott A.; Weil, Sage (2010). "Ceph as a scalable alternative to the Hadoop Distributed FileSystem" (PDF).
    • S.A., Brandt; E.L., Miller; D.D.E., Long; Lan, Xue (2003). "Efficient metadata management in large distributed storage systems". 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. Storage Syst. Res. Center, California Univ., Santa Cruz, CA, USA. pp. 290–298. doi:10.1109/MASS.2003.1194865. ISBN 0-7695-1914-8.
    • Garth A., Gibson; Rodney, MVan Meter (November 2000). "Network attached storage architecture" (PDF). Communications of the ACM. 43 (11). doi:10.1145/353360.353362.
    • Yee, Tin Tin; Thu Naing, Thinn (2011). "PC-Cluster based Storage System Architecture for Cloud Storage". arXiv:1112.2025Freely accessible.
    • Cho Cho, Khaing; Thinn Thu, Naing (2011). "The efficient data storage management system on cluster-based private cloud data center". 2011 IEEE International Conference on Cloud Computing and Intelligence Systems. pp. 235–239. doi:10.1109/CCIS.2011.6045066. ISBN 978-1-61284-203-5.
    • S.A., Brandt; E.L., Miller; D.D.E., Long; Lan, Xue (2011). "A carrier-grade service-oriented file storage architecture for cloud computing". 2011 3rd Symposium on Web Society. PCN&CAD Center, Beijing Univ. of Posts & Telecommun., Beijing, China. pp. 16–20. doi:10.1109/SWS.2011.6101263. ISBN 978-1-4577-0211-2.
    • Ghemawat, Sanjay; Gobioff, Howard; Leung, Shun-Tak (2003). "The Google file system". Proceedings of the nineteenth ACM symposium on Operating systems principles – SOSP '03. pp. 29–43. doi:10.1145/945445.945450. ISBN 1-58113-757-5.
  2. Security
    • Vecchiola, C; Pandey, S; Buyya, R (2009). "High-Performance Cloud Computing: A View of Scientific Applications". 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks. Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia. pp. 4–16. doi:10.1109/I-SPAN.2009.150. ISBN 978-1-4244-5403-7.
    • Miranda, Mowbray; Siani, Pearson (2009). "A client-based privacy manager for cloud computing". Proceedings of the Fourth International ICST Conference on COMmunication System softWAre and middlewaRE – COMSWARE '09. p. 1. doi:10.1145/1621890.1621897. ISBN 978-1-60558-353-2.
    • Naehrig, Michael; Lauter, Kristin (2013). "Can homomorphic encryption be practical?". Proceedings of the 3rd ACM workshop on Cloud computing security workshop – CCSW '11. pp. 113–124. doi:10.1145/2046660.2046682. ISBN 978-1-4503-1004-8.
    • Du, Hongtao; Li, Zhanhuai (2012). "PsFS: A high-throughput parallel file system for secure Cloud Storage system". 2012 International Conference on Measurement, Information and Control (MIC). 1. Comput. Coll., Northwestern Polytech. Univ., Xi'An, China. pp. 327–331. doi:10.1109/MIC.2012.6273264. ISBN 978-1-4577-1604-1.
    • A.Brandt, Scott; L.Miller, Ethan; D.E.Long, Darrell; Xue, Lan (2003). Storage Systems Research Center University of California, Santa Cruz. "Efficient Metadata Management in Large Distributed Storage Systems" (PDF). 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA.
    • Lori M. Kaufman (2009). "Data Security in the World of Cloud Computing". Security & Privacy, IEEE. 7 (4): 161–64. doi:10.1109/MSP.2009.87.
    • Bowers, Kevin; Juels, Ari; Oprea, Alina (2009). "HAIL: a high-availability and integrity layer for cloud storageComputing". Proceedings of the 16th ACM conference on Computer and communications security: 187–198. doi:10.1145/1653662.1653686. ISBN 978-1-60558-894-0.
    • Juels, Ari; Oprea, Alina (February 2013). "New approaches to security and availability for cloud data". Magazine Communications of the ACM CACM Homepage archive. 56 (2): 64–73. doi:10.1145/2408776.2408793.
    • Zhang, Jing; Wu, Gongqing; Hu, Xuegang; Wu, Xindong (2012). "A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services". 2012 ACM/IEEE 13th International Conference on Grid Computing. Dept. of Comput. Sci., Hefei Univ. of Technol., Hefei, China. pp. 12–21. doi:10.1109/Grid.2012.17. ISBN 978-1-4673-2901-9.
    • A., Pan; J.P., Walters; V.S., Pai; D.-I.D., Kang; S.P., Crago (2012). "Integrating High Performance File Systems in a Cloud Computing Environment". 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. Dept. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA. pp. 753–759. doi:10.1109/SC.Companion.2012.103. ISBN 978-0-7695-4956-9.
    • Fan-Hsun, Tseng; Chi-Yuan, Chen; Li-Der, Chou; Han-Chieh, Chao (2012). "Implement a reliable and secure cloud distributed file system". 2012 International Symposium on Intelligent Signal Processing and Communications Systems. Dept. of Comput. Sci. & Inf. Eng., Nat. Central Univ., Taoyuan, Taiwan. pp. 227–232. doi:10.1109/ISPACS.2012.6473485. ISBN 978-1-4673-5082-2.
    • Di Sano, M; Di Stefano, A; Morana, G; Zito, D (2012). "File System As-a-Service: Providing Transient and Consistent Views of Files to Cooperating Applications in Clouds". 2012 IEEE 21st International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises. Dept. of Electr., Electron. & Comput. Eng., Univ. of Catania, Catania, Italy. pp. 173–178. doi:10.1109/WETICE.2012.104. ISBN 978-1-4673-1888-4.
    • Zhifeng, Xiao; Yang, Xiao (2013). "Security and Privacy in Cloud Computing". Communications Surveys & Tutorials, IEEE. 15 (2): 843–859. doi:10.1109/SURV.2012.060912.00182.
    • John B, Horrigan (2008). "Use of cloud computing applications and services" (PDF).
    • Yau, Stephen; An, Ho (2010). "Confidentiality Protection in cloud computing systems". Int J Software Informatics: 351–365.
    • Carnegie, Bin Fan; Tantisiriroj, Wittawat; Xiao, Lin; Gibson, Garth (2009). "Disk Reduce". DiskReduce: RAID for data-intensive scalable computing. pp. 6–10. doi:10.1145/1713072.1713075. ISBN 978-1-60558-883-4.
    • Wang, Jianzong; Gong, Weijiao; P., Varman; Xie, Changsheng (2012). "Reducing Storage Overhead with Small Write Bottleneck Avoiding in Cloud RAID System". 2012 ACM/IEEE 13th International Conference on Grid Computing. pp. 174–183. doi:10.1109/Grid.2012.29. ISBN 978-1-4673-2901-9.
    • Abu-Libdeh, Hussam; Princehouse, Lonnie; Weatherspoon, Hakim (2010). "RACS: a case for cloud storage diversity". SoCC '10 Proceedings of the 1st ACM symposium on Cloud computing: 229–240. doi:10.1145/1807128.1807165. ISBN 978-1-4503-0036-0.
    • Vogels, Werner (2009). "Eventually consistent". Communications of the ACM – Rural engineering development CACM. 52 (1): 40–44. doi:10.1145/1435417.1435432.
    • Cuong, Pham; Cao, Phuong; Kalbarczyk, Z; Iyer, R.K (2012). "Toward a high availability cloud: Techniques and challenges". IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012). pp. 1–6. doi:10.1109/DSNW.2012.6264687. ISBN 978-1-4673-2266-9.
    • A., Undheim; A., Chilwan; P., Heegaard (2011). "Differentiated Availability in Cloud Computing SLAs". 2011 IEEE/ACM 12th International Conference on Grid Computing. pp. 129–136. doi:10.1109/Grid.2011.25. ISBN 978-1-4577-1904-2.
    • Qian, Haiyang; D., Medhi; T., Trivedi (2011). "A hierarchical model to evaluate quality of experience of online services hosted by cloud computing". Communications of the ACM – Rural engineering development CACM. 52 (1): 105–112. doi:10.1109/INM.2011.5990680.
    • Ateniese, Giuseppe; Burns, Randal; Curtmola, Reza; Herring, Joseph; Kissner, Lea; Peterson, Zachary; Song, Dawn (2007). "Provable data possession at untrusted stores". Proceedings of the 14th ACM conference on Computer and communications security – CCS '07. pp. 598–609. doi:10.1145/1315245.1315318. ISBN 978-1-59593-703-2.
    • Ateniese, Giuseppe; Di Pietro, Roberto; V. Mancini, Luigi; Tsudik, Gene (2008). "Scalable and efficient provable data possession". Proceedings of the 4th international conference on Security and privacy in communication networks – Secure Comm '08. p. 1. doi:10.1145/1460877.1460889. ISBN 978-1-60558-241-2.
    • Erway, Chris; Küpçü, Alptekin; Tamassia, Roberto; Papamanthou, Charalampos (2009). "Dynamic provable data possession". Proceedings of the 16th ACM conference on Computer and communications security – CCS '09. pp. 213–222. doi:10.1145/1653662.1653688. ISBN 978-1-60558-894-0.
    • Juels, Ari; S. Kaliski, Burton (2007). "Pors: proofs of retrievability for large files". Proceedings of the 14th ACM conference on Computer and communications: 584–597. doi:10.1145/1315245.1315317. ISBN 978-1-59593-703-2.
    • Bonvin, Nicolas; Papaioannou, Thanasis; Aberer, Karl (2009). "A self-organized, fault-tolerant and scalable replication scheme for cloud storage". Proceedings of the 1st ACM symposium on Cloud computing – SoCC '10. pp. 205–216. doi:10.1145/1807128.1807162. ISBN 978-1-4503-0036-0.
    • Tim, Kraska; Martin, Hentschel; Gustavo, Alonso; Donald, Kossma (2009). "Consistency rationing in the cloud: pay only when it matters". Proceedings of the VLDB Endowment VLDB Endowment Homepage archive. 2 (1): 253–264. doi:10.14778/1687627.1687657.
    • Daniel, J. Abadi (2009). "Data Management in the Cloud: Limitations and Opportunities" (PDF). IEEE. Lay summary.
    • Ari, Juels; S., Burton; Jr, Kaliski (2007). "Pors: proofs of retrievability for large files". Communications of the ACM CACM. 56 (2): 584–597. doi:10.1145/1315245.1315317.
    • Ari, Ateniese; Randal, Burns; Johns, Reza; Curtmola, Joseph; Herring, Burton; Lea, Kissner; Zachary, Peterson; Dawn, Song (2007). "Provable data possession at untrusted stores". CCS '07 Proceedings of the 14th ACM conference on Computer and communications security. pp. 598–609. doi:10.1145/1315245.1315318. ISBN 978-1-59593-703-2.
  3. Synchronization
    • Uppoor, S; Flouris, M.D; Bilas, A (2010). "Cloud-based synchronization of distributed file system hierarchies". 2010 IEEE International Conference on Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS). Inst. of Comput. Sci. (ICS), Found. for Res. & Technol. - Hellas (FORTH), Heraklion, Greece. pp. 1–4. doi:10.1109/CLUSTERWKSP.2010.5613087. ISBN 978-1-4244-8395-2.
  4. Economic aspects
    • Lori M., Kaufman (2009). "Data Security in the World of Cloud Computing". Security & Privacy, IEEE. 7 (4): 161–64. doi:10.1109/MSP.2009.87.
    • Marston, Sean; Lia, Zhi; Bandyopadhyaya, Subhajyoti; Zhanga, Juheng; Ghalsasi, Anand (2011). Cloud computing — The business perspective. Decision Support Systems Volume 51, Issue 1,. pp. 176–189. doi:10.1016/j.dss.2010.12.006.
    • Angabini, A; Yazdani, N; Mundt, T; Hassani, F (2011). "Suitability of Cloud Computing for Scientific Data Analyzing Applications; an Empirical Study". 2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran. pp. 193–199. doi:10.1109/3PGCIC.2011.37. ISBN 978-1-4577-1448-1.
Continue Reading...
Content from Wikipedia Licensed under CC-BY-SA.

Distributed file system for cloud

topic

Distributed file system for cloud

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations (create, delete, modify, read, write) on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system. Users can share computing resources through the Internet thanks to cloud computing which is typically characterized by scalable and elastic resources – such as physical servers, applications and any services that are virtualized and allocated dynamically ...more...

Member feedback about Distributed file system for cloud:

Cloud computing

Revolvy Brain (revolvybrain)

Revolvy User


Clustered file system

topic

Clustered file system

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.[1] Shared-disk file system A shared-disk file system uses a storage-area network (SAN) to allow multiple computers to gain direct disk access at the block level. Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on the client node. The most common type of clustered file system, the shared-disk file system —by adding mechanisms for concurrency control—pr ...more...

Member feedback about Clustered file system:

Data management

Revolvy Brain (revolvybrain)

Revolvy User


Comparison of distributed file systems

topic

Comparison of distributed file systems

In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share files and storage resources. Distributed file systems differ in their performance, mutability of content, handling of concurrent writes, handling of permanent or temporary loss of nodes or storage, and their policy of storing content. Locally managed Client Written in License Access API Ceph C++ LGPL librados (C, C++, Python, Ruby), S3, Swift, FUSE BeeGFS C / C++ FRAUNHOFER FS (FhGFS) EULA,[1]GPLv2 client POSIX GlusterFS C GPLv3 libglusterfs, FUSE, NFS, SMB, Swift, libgfapi Infinit[2] C++ Proprietary (to be open sourced)[3] FUSE, Installable File System, NFS/SMB, POSIX, CLI, SDK (libinfinit) Isilon OneFS C/C++ Proprietary POSIX, NFS, SMB/CIFS, HDFS, HTTP, FTP, SWIFT Object, CLI, Rest API ObjectiveFS[4] C Proprietary ...more...

Member feedback about Comparison of distributed file systems:

Network file systems

Revolvy Brain (revolvybrain)

Revolvy User


Apache Hadoop

topic

Apache Hadoop

Apache Hadoop ( ) is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware[3]—still the common use—it has also found use on clusters of higher-end hardware.[4][5] All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.[2] The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality,[6] wher ...more...

Member feedback about Apache Hadoop:

Cloud infrastructure

Revolvy Brain (revolvybrain)

Revolvy User

jamie

(corncrew)

Revolvy User


List of file systems

topic

List of file systems

The following lists identify, characterize, and link to more thorough information on computer file systems. Many older operating systems support only their one "native" file system, which does not bear any name apart from the name of the operating system itself. Disk file systems Disk file systems are usually block-oriented. Files in a block-oriented file system are sequences of blocks, often featuring fully random-access read, write, and modify operations. ADFS – Acorn's Advanced Disc filing system, successor to DFS. AdvFS – Advanced File System, designed by Digital Equipment Corporation for their Digital UNIX (now Tru64 UNIX) operating system. APFS – Apple File System is a next-generation file system for Apple products. AthFS – AtheOS File System, a 64-bit journaled filesystem now used by Syllable. Also called AFS. BFS – the Boot File System used on System V release 4.0 and UnixWare. BFS – the Be File System used on BeOS, occasionally misnamed as BeFS. Open source implementation called OpenB ...more...

Member feedback about List of file systems:

Computing-related lists

Revolvy Brain (revolvybrain)

Revolvy User


InterPlanetary File System

topic

InterPlanetary File System

InterPlanetary File System (IPFS) is a protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system.[2] IPFS was initially designed by Juan Benet, and is now an open-source project developed with help from the community.[3][4] History In 2014, the IPFS protocol took advantage of the Bitcoin blockchain protocol and network infrastructure in order to store unalterable data, remove duplicated files across the network, and obtain address information for accessing storage nodes to search for files in the network.[5][2] Implementations in Go[6] and JavaScript[7] exist, and a Python implementation is in progress.[8] The Go implementation is considered to be the reference implementation[9] while formal specifications are developed.[10] Description IPFS is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the World Wide Web, but IPFS ...more...

Member feedback about InterPlanetary File System:

Application layer protocols

Revolvy Brain (revolvybrain)

Revolvy User


Ceph (software)

topic

Ceph (software)

In computing, Ceph (pronounced or ) is a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available. Ceph replicates data and makes it fault-tolerant,[7] using commodity hardware and requiring no specific hardware support. As a result of its design, the system is both self-healing and self-managing, aiming to minimize administration time and other costs. On April 21, 2016, the Ceph development team released "Jewel", the first Ceph release in which CephFS is considered stable. The CephFS repair and disaster recovery tools are feature-complete (snapshots, multiple active metadata servers and some other functionality is disabled by default).[8] The August, 2017 release (codename "Luminous") introduced the production-ready BlueStore storage format which avoids ...more...

Member feedback about Ceph (software):

Red Hat

Revolvy Brain (revolvybrain)

Revolvy User


Google File System

topic

Google File System

Google File System (GFS or GoogleFS) is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of Google File System code named Colossus was released in 2010.[1][2] Design Google File System is designed for system-to-system interaction, and not for user-to-system interaction. The chunk servers replicate the data automatically. GFS is enhanced for Google's core data storage and usage needs (primarily the search engine), which can generate enormous amounts of data that must be retained; Google File System grew out of an earlier Google effort, "BigFiles", developed by Larry Page and Sergey Brin in the early days of Google, while it was still located in Stanford. Files are divided into fixed-size chunks of 64 megabytes, similar to clusters or sectors in regular file systems, which are only extremely rarely overwritten, or shrunk; files are usually appended to or read. It is also designed and optimize ...more...

Member feedback about Google File System:

Google

Revolvy Brain (revolvybrain)

Revolvy User


Global file system

topic

Global file system

In computer science, global file system has historically referred to a distributed virtual name space built on a set of local file systems to provide transparent access to multiple, potentially distributed, systems.[1] These global file systems had the same properties such as blocking interface, no buffering etc. but guaranteed that the same path name corresponds to the same object on all computers deploying the filesystem. Also called distributed file systems these file systems rely on redirection to distributed systems, therefore latency and scalability can affect file access depending on where the target systems reside. History The Andrew File System attempted to solve this for a campus environment using caching and a weak consistency model to achieve local access to remote files. More recently, global file systems have emerged that combine cloud or any object storage, versioning and local caching to create a single, unified, globally accessible file system that does not rely on redirection to a storage ...more...

Member feedback about Global file system:

Computer file systems

Revolvy Brain (revolvybrain)

Revolvy User


LizardFS

topic

LizardFS

LizardFS is an open source distributed file system that is POSIX-compliant and licensed under GPLv3.[2][3] It was released in 2013 as fork of MooseFS.[4] LizardFS is a distributed, scalable and fault-tolerant file system. This is achieved by spreading data over several physical servers and their associated physical data storage. This storage is presented to the end-user as a single logical name space.[5] The file system is designed so that it is possible to add more disks and servers “on the fly”, without the need for any server reboots or shut-downs.[6] Description LizardFS is used as the underlying distributed file system for a product developed by CloudWeavers that uses the OpenNebula cloud management platform, marketed as hyper-converged infrastructure.[7] Other possible use cases include storage for cloud hosting services, render farms and backups.[8][9][10] High availability and the native Microsoft Windows client are licensed features that are available under a support contract. Alternatively, a ti ...more...

Member feedback about LizardFS:

File system management

Revolvy Brain (revolvybrain)

Revolvy User


CloudStore

topic

CloudStore

CloudStore (KFS, previously Kosmosfs) was Kosmix's C++ implementation of the Google File System. It parallels the Hadoop project, which is implemented in the Java programming language. CloudStore supports incremental scalability, replication, checksumming for data integrity, client side fail-over and access from C++, Java and Python. There is a FUSE module so that the file system can be mounted on Linux. In September 2007 Kosmix published Kosmosfs as open source.[1] The last commit activity was in 2010. The Google Code page for Kosmosfs now points to the Quantcast File System on GitHub which is the successor to KFS.[2] A former project on SourceForge used the name CloudStore in 2008.[3] See also Google File System List of file systems GlusterFS Moose File System References "Kosmix releases Google GFS workalike 'KFS' as open source". Skrentablog. September 27, 2007. Archived from the original on October 11, 2007. Retrieved June 25, 2013. "Kosmos distributed file system". Google Code web site. R ...more...

Member feedback about CloudStore:

User space file systems

Revolvy Brain (revolvybrain)

Revolvy User


MapR

topic

MapR

MapR is a business software company headquartered in Santa Clara, California. MapR provides access to a variety of data sources from a single computer cluster, including big data workloads such as Apache Hadoop and Apache Spark, a distributed file system, a multi-model database management system, and event stream processing, combining analytics in real-time with operational applications. Its technology runs on both commodity hardware and public cloud computing services. Funding MapR is privately held with original funding of $9 million from Lightspeed Venture Partners and New Enterprise Associates in 2009. MapR executives come from Google, Lightspeed Venture Partners, Informatica, EMC Corporation and Veoh. MapR had an additional round of funding led by Redpoint in August, 2011.[3][4] A round in 2013 was led by Mayfield Fund that also included Greenspring Associates.[5] In June 2014, MapR closed a $110 million financing round that was led by Google Capital. Qualcomm Ventures also participated, along with exi ...more...

Member feedback about MapR:

Software companies based in the San Francisco B...

Revolvy Brain (revolvybrain)

Revolvy User


File system

topic

File system

In computing, a file system or filesystem controls how data is stored and retrieved. Without a file system, information placed in a storage medium would be one large body of data with no way to tell where one piece of information stops and the next begins. By separating the data into pieces and giving each piece a name, the information is easily isolated and identified. Taking its name from the way paper-based information systems are named, each group of data is called a "file". The structure and logic rules used to manage the groups of information and their names is called a "file system". There are many different kinds of file systems. Each one has different structure and logic, properties of speed, flexibility, security, size and more. Some file systems have been designed to be used for specific applications. For example, the ISO 9660 file system is designed specifically for optical discs. File systems can be used on numerous different types of storage devices that use different kinds of media. The most ...more...

Member feedback about File system:

Computer file systems

Revolvy Brain (revolvybrain)

Revolvy User

file systems

(kunle)

Revolvy User


Cloud storage

topic

Cloud storage

Cloud storage is a model of computer data storage in which the digital data is stored in logical pools. The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is typically owned and managed by a hosting company. These cloud storage providers are responsible for keeping the data available and accessible, and the physical environment protected and running. People and organizations buy or lease storage capacity from the providers to store user, organization, or application data. Cloud storage services may be accessed through a colocated cloud computing service, a web service application programming interface (API) or by applications that utilize the API, such as cloud desktop storage, a cloud storage gateway or Web-based content management systems. History Cloud computing is believed to have been invented by Joseph Carl Robnett Licklider in the 1960s with his work on ARPANET to connect people and data from anywhere at any time.[1] In 1983, CompuServe offer ...more...

Member feedback about Cloud storage:

Articles needing expert attention from December...

Revolvy Brain (revolvybrain)

Revolvy User


Scality

topic

Scality

Scality is a global company based in San Francisco, California that develops software-defined object storage. The Scality scale-out object storage software platform called RING is the company's commercial product. Scality RING software deploys on industry-standard x86 servers to store objects and files. Scality also offers a number of open source tools called Zenko, including Zenko CloudServer, compatible with the Amazon S3 API. History Scality was founded in 2009 by Jérôme Lecat, Giorgio Regni, Daniel Binsfeld, Serge Dugas, and Brad King. Scality raised $7 million of venture capital funding in March 2011.[1] A C-round of $22 million was announced in June 2013, led by Menlo Ventures and Iris Capital with participation from FSN PME and all existing investors, including Idinvest Partners, OMNES Capital and Galileo Partners.[2][3][4] Scality raised $45 million in August 2015. This Series D funding was led by Menlo Ventures with participation from all existing investors and one new strategic investor, BroadBand ...more...

Member feedback about Scality:

Software companies based in the San Francisco B...

Revolvy Brain (revolvybrain)

Revolvy User


Ace Stream

topic

Ace Stream

Ace Stream is a peer-to-peer multimedia streaming protocol, built using BitTorrent technology.[1] Ace Stream has been recognized by authoritative sources as a method for broadcasting and viewing bootlegged live video streams.[2] History Ace Stream began under the name TorrentStream as a pilot project to use BitTorrent technology to stream live video. In 2013 TorrentStream was re-released under the name ACE Stream.[3] Description Ace Stream clients function as both a client and a server. When users stream a video feed using Ace Stream, they are simultaneously downloading from peers and uploading the same video to other peers. References "Ace Stream". info.acestream.org. Retrieved 2018-06-18. "Cord Cutting Is Great, Except for Those Live Events". PCMAG. Retrieved 2018-06-18. "Announcement! ACE Stream – New era of TV and Internet broadcasting". forum.torrentstream.org. Retrieved 2018-06-18. ...more...

Member feedback about Ace Stream:

Multimedia

Revolvy Brain (revolvybrain)

Revolvy User


Distributed hash table

topic

Distributed hash table

A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table: (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures. DHTs form an infrastructure that can be used to build more complex services, such as anycast, cooperative Web caching, distributed file systems, domain name services, instant messaging, multicast, and also peer-to-peer file sharing and content distribution systems. Notable distributed networks that use DHTs include BitTorrent's distributed tracker, the Coral Content Distribution Network, the Kad network, the Storm botnet, the Tox insta ...more...

Member feedback about Distributed hash table:

File sharing

Revolvy Brain (revolvybrain)

Revolvy User


Amazon Elastic File System

topic

Amazon Elastic File System

Amazon Elastic File System (Amazon EFS) is a cloud storage service provided by Amazon Web Services (AWS) designed to provide scalable, elastic, concurrent with some restrictions,[3] and encrypted[4] file storage for use with both AWS cloud services and on-premises resources.[5] Amazon EFS is built to be able to grow [6] and shrink automatically as files are added and removed. Amazon EFS supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4) protocol,[7] and control access to files through Portable Operating System Interface (POSIX) permissions.[8] Use cases According to Amazon, use cases for this file system service typically include content repositories, development environments, web server farms, home directories and big data applications.[9] Data consistency Amazon EFS provides open-after-close consistency semantics that applications expect from NFS.[3] See also GlusterFS Red Hat Storage Server References "Amazon Elastic File System (Amazon EFS) is Now Generally Available". Amazon We ...more...

Member feedback about Amazon Elastic File System:

Cloud storage

Revolvy Brain (revolvybrain)

Revolvy User


Tahoe-LAFS

topic

Tahoe-LAFS

Tahoe-LAFS (Tahoe Least-Authority File Store[4]) is a free and open, secure, decentralized, fault-tolerant, distributed data store and distributed file system.[5][6] It can be used as an online backup system, or to serve as a file or web host similar to Freenet,[7] depending on the front-end used to insert and access files in the Tahoe system. Tahoe can also be used in a RAID-like fashion using multiple disks to make a single large RAIN[8] pool of reliable data storage. The system is designed and implemented around the "principle of least authority" (POLA). Strict adherence to this convention is enabled by the use of cryptographic capabilities which provide the minimum set of privileges necessary to perform a given task by asking agents. A RAIN array acts as a storage volume - these servers do not need to be trusted by confidentiality or integrity of the stored data. Fork A patched version of Tahoe-LAFS exists from 2011, and was made to run on anonymous networks I2P, with support for multiple introducers. ...more...

Member feedback about Tahoe-LAFS:

Cross-platform free software

Revolvy Brain (revolvybrain)

Revolvy User


ObjectiveFS

topic

ObjectiveFS

ObjectiveFS is a distributed file system developed by Objective Security Corp. It is a POSIX-compliant file system built with an object store backend.[1][2] It was initially released with AWS S3 backend, and has later implemented support for Google Cloud Storage and object store devices. It was released for beta in early 2013, and the first version was officially released on August 11, 2013. Design ObjectiveFS implements a log structured file system on top of object stores (such as Amazon S3, Google Cloud Storage and other object store devices).[3] It is a POSIX compliant file system and supports features such as dynamic file system size, soft and hard links, unix attributes, extended attributes, Unix timestamps, users and permissions, no limit on file size, atomic renames, atomic file creation, directory renames, read and write anywhere in a file, named pipes, sockets, etc.[4][5] It implements client-side encryption and uses the NaCl crypto library, with algorithms like Salsa20 and Poly1305. This approach ...more...

Member feedback about ObjectiveFS:

Distributed file systems

Revolvy Brain (revolvybrain)

Revolvy User


Comparison of video hosting services

topic

Comparison of video hosting services

The following tables compare general and technical information for a number of current, notable video hosting services. Please see the individual products' articles for further information. General information Basic general information about the hosts: creator/company, license/price etc. Service Owner Launched Content license Ads # videos (millions) Views per day (millions) Main server location Censorship Multilingual Ad revenue sharing Video download-able Registration needed Break.com Defy Media 1998 own TOS Yes Unknown Unknown  United States Yes[1] No Unknown No Yes Dailymotion Vivendi March 15, 2005 own TOS Yes >10 ~60  France Yes[1] Yes Yes No[2] Yes EngageMedia EngageMedia March 2005 Creative Commons No Unknown Unknown  Germany Yes Yes N/A Yes Yes Flickr Yahoo 2004 own TOS Yes Unknown Unknown  United States Yes Yes No No Yes Globo Video Globo.com 2002 Unknown Yes >0.5 Unknown  Brazil Yes[1] No ...more...

Member feedback about Comparison of video hosting services:

Video hosting

Revolvy Brain (revolvybrain)

Revolvy User


Cloudant

topic

Cloudant

Cloudant is an IBM software product, which is primarily delivered as a cloud-based service. Cloudant is a non-relational, distributed database service of the same name. Cloudant is based on the Apache-backed CouchDB project and the open source BigCouch project. Cloudant's service provides integrated data management, search, and analytics engine designed for web applications. Cloudant scales databases on the CouchDB framework and provides hosting, administrative tools, analytics and commercial support for CouchDB and BigCouch.[1] Cloudant's distributed CouchDB service is used the same way as standalone CouchDB, with the added advantage of data being redundantly distributed over multiple machines. Cloudant was acquired by IBM from the start-up company of the same name. The acquisition was announced on February 24, 2014,[2] The acquisition was completed on March 4 of that year.[3] By March 31, 2018, Cloudant Shared Plan will be retired and migrated to IBM Cloud.[4] History Cloudant was founded by Alan Hoffm ...more...

Member feedback about Cloudant:

Cloud computing providers

Revolvy Brain (revolvybrain)

Revolvy User


Comparison of streaming media systems

topic

Comparison of streaming media systems

This is a comparison of streaming media systems. A more complete list of streaming media systems is also available. General The following tables compare general and technical information for a number of streaming media systems both audio and video. Please see the individual systems' linked articles for further information. Name Creator First Public Release (yyyy-MM-dd) Latest Stable Version (Release Date) Latest Release Date Cost (USD) license Media atmosph3re Guillaume Carrier 2005-08-15 3.0.7 (2015-10-31) 2015-10-31 $30 perpetual license proprietary Audio Cameleon Yatko 2014-04-01 1.0.7 (2016-11-11) 2016-11-11 Free proprietary Audio/Video Darwin Streaming Server Apple Inc. 1999-03-16 6.0.3 (2007-05-10) 2007-05-10 Free APSL Audio/Video Feng LSCube[1] 2007-05-31 2009-10-14 2009-10-04 Free GPL Audio/Video Firefly Ron Pedde ? 0.2.4.2 (2008-04-19) 2008-04-19 Free GPL Audio Adobe Flash Media Server Macromedia/Adobe Systems 2002-07-9 ...more...

Member feedback about Comparison of streaming media systems:

Video hosting

Revolvy Brain (revolvybrain)

Revolvy User


Amazon Drive

topic

Amazon Drive

Amazon Drive, formerly known as Cloud Drive, is a cloud storage application managed by Amazon.[1] The service offers secure cloud storage, file backup, file sharing, and Photo printing. Using an Amazon account, the files and folders can be transferred and managed from multiple devices including web browsers, desktop applications, mobiles, and tablets. Amazon Drive also lets their U.S. users order photo prints and photo books using the Amazon Prints service.[2] Today, Amazon Drive offers free unlimited photo storage with an Amazon Prime subscription or a Kindle Fire device, and a paid limited storage service.[3][4] Launched in major countries including U.S, Canada, European nations, Japan, and Australia.[5] It also functions in China and Brazil as a free limited 5GB storage service. History Amazon first announced the storage service on March 29, 2011, initially offering pay-as-you-need tiered storage plans for the users. Users paid only for the storage tier they utilized expandable up to a maximum of 1 Tera ...more...

Member feedback about Amazon Drive:

Cloud computing providers

Revolvy Brain (revolvybrain)

Revolvy User


Filecoin

topic

Filecoin

Filecoin is an open-source, public, cryptocurrency and digital payment system intended to be a blockchain-based digital storage and data retrieval method.[1][2][3][4] It is made by Protocol Labs and builds on top of InterPlanetary File System.[1] Filecoin has raised $52 million in a pre-initial coin offering (pre-ICO) sale[5] and $200 million in ICO.[6] See also STORJ References "Showdown in the Cloud: Dropbox IPO, Meet the Filecoin ICO". Observer. 2017-07-11. Retrieved 2017-08-11. Tepper, Fitz. "Filecoin's ICO opens today for accredited investors after raising $52M from advisers | TechCrunch". Retrieved 2017-08-11. Vigna, Paul (2017-08-11). "Latest Hot Digital Coin Offering: $187 Million in One Hour for Filecoin". Wall Street Journal. ISSN 0099-9660. Retrieved 2017-08-11. "Investors poured millions into a storage network that doesn't exist yet". Ars Technica. Retrieved 2017-08-11. "Is There Such a Thing as a SEC Compliant ICO? Filecoin Thinks So - Raises $52 Mln". Cointelegraph. 6 August ...more...

Member feedback about Filecoin:

Distributed data storage

Revolvy Brain (revolvybrain)

Revolvy User


Replication (computing)

topic

Replication (computing)

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility. Terminology One speaks of: data replication if the same data is stored on multiple storage devices, computation replication if the same computing task is executed many times. A computational task is typically replicated in space, i.e. executed on separate devices, or it could be replicated in time, if it is executed repeatedly on a single device. Replication in space or in time is often linked to scheduling algorithms [1] The access to a replicated entity is typically uniform with access to a single, non-replicated entity. The replication itself should be transparent to an external user. Also, in a failure scenario, a failover of replicas is hidden as much as possible. The latter refers to data replication with respect to Quality of Service (QoS) aspects.[2] Computer scientists talk about ac ...more...

Member feedback about Replication (computing):

Database management systems

Revolvy Brain (revolvybrain)

Revolvy User


Apache Spark

topic

Apache Spark

Apache Spark is an open-source cluster-computing framework. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. Overview Apache Spark has as its architectural foundation the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way.[2] In Spark 1.x, the RDD was the primary application programming interface (API), but as of Spark 2.x use of the Dataset API is encouraged[3] even though the RDD API is not deprecated.[4][5] The RDD technology still underlies the Dataset API.[6][7] Spark and its RDDs were developed in 2012 in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflow structure on distributed programs: MapReduce progr ...more...

Member feedback about Apache Spark:

University of California, Berkeley

Revolvy Brain (revolvybrain)

Revolvy User


BOSH (software)

topic

BOSH (software)

BOSH is an open source project that offers a tool chain for release engineering, deployment & life-cycle management of large scale distributed services. Namely, this tool chain is made of a server (the BOSH Director) and a command line tool. BOSH is typically used to package, deploy and manage cloud software. While BOSH was initially developed by VMware in 2010 to deploy Cloud Foundry PaaS, it can be used to deploy other software (such as Hadoop, RabbitMQ, or MySQL for instance). BOSH is particularly well-suited for managing the whole life cycle of large distributed systems. Since March 2016, BOSH can manage deployments on both Windows[1] and Linux servers. A BOSH Director communicates with a single IaaS that provides the underlying networking and VMs (or containers). Several IaaS providers are supported: Amazon Web Services EC2, Apache CloudStack, Google Compute Engine, Microsoft Azure, OpenStack, and VMware vSphere. To help support more underlying infrastructures, BOSH uses a concept of Cloud Provid ...more...

Member feedback about BOSH (software):

Free software for cloud computing

Revolvy Brain (revolvybrain)

Revolvy User


Cloud Foundry

topic

Cloud Foundry

Cloud Foundry is an open source, multi cloud application platform as a service (PaaS) governed by the Cloud Foundry Foundation, a 501(c)(6) organization.[1] The software was originally developed by VMware and then transferred to Pivotal Software, a joint venture by EMC, VMware and General Electric. History Originally conceived in 2009, Cloud Foundry was designed and developed by a small team at VMware led by Derek Collison and was originally called Project B29.[2][3][4] At the time, a different PaaS project written in Java for Amazon EC2 used the name Cloud Foundry. It was founded by Chris Richardson in 2008 and acquired by SpringSource in 2009,[5] the same year VMWare acquired SpringSource. The current project is unrelated to the project under SpringSource, but the name was adopted when the original SpringSource project ended. The announcement of Cloud Foundry took place in April 2011. A year later, in April 2012, BOSH, an open source tool chain for release engineering, deployment & life-cycle manag ...more...

Member feedback about Cloud Foundry:

Free software programmed in Ruby

Revolvy Brain (revolvybrain)

Revolvy User


Comparison of file synchronization software

topic

Comparison of file synchronization software

This is a list of file synchronization software. File synchronization is a process of ensuring that files in two or more locations are updated via certain rules. Definitions The following definitions clarify the purpose of columns used in the tables that follow. Name It may contain a product name, or product name plus edition name, depending on what is discussed. Operating system Platform Operating system (OS) is a computer program that makes a computer usable, hence all computers must have it. The operating system column lists the name of the operating systems on which the corresponding synchronization software runs. Platform is a broader term; it is used as the column name because some of the software in the table were OS-independent but required a certain software platform like Java SE or .NET Framework. Programming language Programming language was used to write the software, if known. For closed-source software this information may not be known. License Indicates the licensing model under which the ...more...

Member feedback about Comparison of file synchronization software:

Software comparisons

Revolvy Brain (revolvybrain)

Revolvy User

Softwares

(dewandika)

Revolvy User


Sector/Sphere

topic

Sector/Sphere

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting. The system was created by Yunhong Gu (the author of UDP-based Data Transfer Protocol) in 2006 and was then maintained by a group of other developers. Architecture Sector/Sphere consists of four components. The security server maintains the system security policies such as user accounts and the IP access control list. One or more master servers control operations of the overall system in addition to responding to various user requests. The slave nodes store the data files and process them upon request. The clients are the users' computers from whic ...more...

Member feedback about Sector/Sphere:

Cloud infrastructure

Revolvy Brain (revolvybrain)

Revolvy User


Cloud computing

topic

Cloud computing

Cloud computing metaphor: the group of networked elements providing services need not be individually addressed or managed by users; instead, the entire provider-managed suite of hardware and software can be thought of as an amorphous cloud. Cloud computing is an information technology (IT) paradigm that enables ubiquitous access to shared pools of configurable system resources and higher-level services that can be rapidly provisioned with minimal management effort, often over the Internet. Cloud computing relies on sharing of resources to achieve coherence and economies of scale, similar to a public utility. Third-party clouds enable organizations to focus on their core businesses instead of expending resources on computer infrastructure and maintenance.[1] Advocates note that cloud computing allows companies to avoid or minimize up-front IT infrastructure costs. Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and le ...more...

Member feedback about Cloud computing:

Cloud infrastructure

Revolvy Brain (revolvybrain)

Revolvy User

Research SDMN

Muhammad Emran (memran)

Revolvy User

SIA

(eloisa)

Revolvy User


Distributed Data Management Architecture

topic

Distributed Data Management Architecture

Distributed Data Management Architecture (DDM) is IBM's open, published software architecture for creating, managing and accessing data on a remote computer. DDM was initially designed to support record-oriented files; it was extended to support hierarchical directories, stream-oriented files, queues, and system command processing; it was further extended to be the base of IBM's Distributed Relational Database Architecture (DRDA); and finally, it was extended to support data description and conversion. Defined in the period from 1980 to 1993, DDM specifies necessary components, messages, and protocols, all based on the principles of object-orientation. DDM is not, in itself, a piece of software; the implementation of DDM takes the form of client and server products. As an open architecture, products can implement subsets of DDM architecture and products can extend DDM to meet additional requirements. Taken together, DDM products implement a distributed file system. DDM Architecture in the media. Distribut ...more...

Member feedback about Distributed Data Management Architecture:

Distributed file systems

Revolvy Brain (revolvybrain)

Revolvy User


Object storage

topic

Object storage

Object storage (also known as object-based storage[1]) is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manage data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.[2] Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level (object-storage device), the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that can be directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity. Object-storage systems allow retention of massive amounts of unstructured data. Object storage is used for purposes ...more...

Member feedback about Object storage:

Data management

Revolvy Brain (revolvybrain)

Revolvy User


BigCouch

topic

BigCouch

BigCouch is an open source, highly available, fault-tolerant, clustered & API-compliant version of Apache CouchDB, which was maintained by Cloudant. On January 5, 2012, Cloudant announced they would contribute the BigCouch horizontal scaling framework into the CouchDB project.[1] The merge was completed in July 2013.[2] Cloudant announced in June 2015 that they were no longer supporting BigCouch.[3] BigCouch allows users to create clusters of CouchDBs that are distributed over an arbitrary number of servers. While it appears to the end-user as one CouchDB instance, it is in fact one or more nodes in an elastic cluster, acting in concert to store and retrieve documents, index and serve views, and serve CouchApps. Clusters behave according to concepts outlined in Amazon's Dynamo paper,[4] namely that each node can accept requests, data is placed on partitions based on a consistent hashing algorithm, and quorum protocols are for read/write operations. It relies on Erlang and the Open Telecom Platform, des ...more...

Member feedback about BigCouch:

Cloud infrastructure

Revolvy Brain (revolvybrain)

Revolvy User


Parallel Virtual File System

topic

Parallel Virtual File System

The Parallel Virtual File System (PVFS) is an open source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, Argonne, NSF PAC ...more...

Member feedback about Parallel Virtual File System:

Network file systems

Revolvy Brain (revolvybrain)

Revolvy User


STORJ

topic

STORJ

STORJ is a cryptocurrency and digital payment system and is also a blockchain-based digital storage and data retrieval method currently using the Ethereum system.[1][2][3] In April 2016, Storj Labs officially announced the beta release of their cloud platform. which will allow users to store data on a decentralized network. The Beta is only accessible through an invite system and will be available publicly after the platform’s initial testing phase.[4] In July 2016, Counterparty, the technology for decentralized financial tools on the Bitcoin blockchain, and Storj, have teamed up and combined resources to develop payment channel technology for use on the Counterparty network.[5] See also Filecoin References Shin, Laura. "Filecoin ICO, Launching Next Week, Aims To Resolve Token Sale Problems". Forbes. Retrieved 2017-08-12. "Storj to Migrate Decentralized Storage Service to Ethereum Blockchain - CoinDesk". CoinDesk. 2017-03-23. Retrieved 2017-08-12. "When blockchain meets big data, the payoff ...more...

Member feedback about STORJ:

Distributed data storage

Revolvy Brain (revolvybrain)

Revolvy User


BeeGFS

topic

BeeGFS

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most important aspect is data throughput. BeeGFS is developed at the Fraunhofer Institute for Industrial Mathematics (ITWM) in Kaiserslautern, Germany and was initially known under the name FhGFS, short for Fraunhofer Gesellschaft File System (or Fraunhofer FS). On ISC'14 in Leipzig, Fraunhofer presented the new name for the first time to the public, even though the renaming process had begun with the founding of a Fraunhofer spin-off. The software can be downloaded and used free of charge from the project’s website.[2] BeeGFS started in 2005 as an in-house development at Fraunhofer ITWM to replace the existing file system on the institute’s new compute cluster and to be used in a production environment. In 2007, the first beta version of the software was announced during ISC07 in Dresden, Germany and introduced to the p ...more...

Member feedback about BeeGFS:

Network file systems

Revolvy Brain (revolvybrain)

Revolvy User


P2PTV

topic

P2PTV

P2PTV overlay network serving three video streams. P2PTV refers to peer-to-peer (P2P) software applications designed to redistribute video streams in real time on a P2P network; the distributed video streams are typically TV channels from all over the world but may also come from other sources. The draw to these applications is significant because they have the potential to make any TV channel globally available by any individual feeding the stream into the network where each peer joining to watch the video is a relay to other peer viewers, allowing a scalable distribution among a large audience with no incremental cost for the source. Technology and use In a P2PTV system, each user, while downloading a video stream, is simultaneously also uploading that stream to other users, thus contributing to the overall available bandwidth. The arriving streams are typically a few minutes time-delayed compared to the original sources. The video quality of the channels usually depends on how many users are watching; ...more...

Member feedback about P2PTV:

Digital television

Revolvy Brain (revolvybrain)

Revolvy User


Red Hat Storage Server

topic

Red Hat Storage Server

Red Hat Gluster Storage, formerly Red Hat Storage Server, is a computer storage product from Red Hat. It is based on open source technologies such as GlusterFS and Red Hat Enterprise Linux.[2] The latest release, RHGS 3.1, combines Red Hat Enterprise Linux (RHEL 6 and also RHEL 7) with the latest GlusterFS community release, oVirt, and XFS File System.[3][4] In April 2014, Red Hat re-branded GlusterFS-based Red Hat Storage Server to "Red Hat Gluster Storage".[5] Description Red Hat Gluster Storage, a scale-out NAS product, uses as its basis GlusterFS, a distributed file-system. Red Hat Gluster Storage also exemplifies software-defined storage (SDS). History In June 2012, Red Hat Gluster Storage was announced as a commercially supported integration of GlusterFS with Red Hat Enterprise Linux.[6] Releases 3.3 Release Notes[7] 3.2 3.1 3.0 2.1 References https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/3.3_release_notes/ "What is Red Hat Gluster Storage?". Red H ...more...

Member feedback about Red Hat Storage Server:

Software using the GPL license

Revolvy Brain (revolvybrain)

Revolvy User


Wuala

topic

Wuala

Wuala was a secure online file storage, file synchronization, versioning and backup service originally developed and run by Caleido Inc.[2] It is now part of LaCie, which is in turn owned by Seagate Technology. The service stores files in data centres that are provided by Wuala in multiple European countries (France, Germany, Switzerland).[3] An earlier version also supported distributed storage on other users' machines, however this feature has been dropped.[4] On 17 August 2015 Wuala announced that it was discontinuing its service and that all stored data would be deleted on 15 November 2015.[5] Wuala recommended a rival cloud storage startup, Tresorit, as an alternative to its remaining customers.[5][6] History Most research and development occurred at the Swiss Federal Institute of Technology (ETH) in Zürich. Old logo for Wuala 14 August 2008 An "open beta"-java-applet, available from the website, could be run from a web browser. 19 September 2008 The Wuala Webstart[7] project was registered ...more...

Member feedback about Wuala:

2008 software

Revolvy Brain (revolvybrain)

Revolvy User


Distributed data store

topic

Distributed data store

A distributed data store is a computer network where information is stored on more than one node, often in a replicated fashion.[1] It is usually specifically used to refer to either a distributed database where users store information on a number of nodes, or a computer network in which users store information on a number of peer network nodes. Distributed databases Distributed databases are usually non-relational databases that enable a quick access to data over a large number of nodes. Some distributed databases expose rich query abilities while others are limited to a key-value store semantics. Examples of limited distributed databases are Google's Bigtable, which is much more than a distributed file system or a peer-to-peer network,[2] Amazon's Dynamo[3] and Windows Azure Storage.[4] As the ability of arbitrary querying is not as important as the availability, designers of distributed data stores have increased the latter at an expense of consistency. But the high-speed read/write access results in re ...more...

Member feedback about Distributed data store:

Data management

Revolvy Brain (revolvybrain)

Revolvy User


Distributed networking

topic

Distributed networking

Distributed networking is a distributed computing network system, said to be distributed when the computer programming, the software, and the data to be worked on are spread out across more than one computer, but they communicate, or are dependant upon each other. Usually, this is implemented over a computer network. Prior to the emergence of low-cost desktop computer power, computing was generally centralized to one computer. Although such centers still exist, distribution networking applications and data operate more efficiently over a mix of desktop workstations, local area network servers, regional servers, Web servers, and other servers. One popular trend is client/server computing. This is the principle that a client computer can provide certain capabilities for a user and request others from other computers that provide services for the clients. (The Web's Hypertext Transfer Protocol is an example of this idea.) Enterprises that have grown in scale over the years and those that are continuing to gro ...more...

Member feedback about Distributed networking:

Distributed data storage

Revolvy Brain (revolvybrain)

Revolvy User


Gluster

topic

Gluster

Gluster Inc. was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineering center in Bangalore, India. Gluster was funded by Nexus Venture Partners and Index Ventures. Gluster was acquired by Red Hat on October 7, 2011.[1] History The name Gluster combined GNU and cluster. Despite the similarity in names, Gluster is not related to the Lustre file system and does not incorporate any Lustre code. Gluster based its product on GlusterFS, an open-source software-based network-attached filesystem that deploys on commodity hardware.[2] The initial version of GlusterFS was written by Anand Babu Periasamy, Gluster’s founder and CTO.[3] In May 2010 Ben Golub became the president and chief executive officer.[4][5] Red Hat became the primary author and maintainer of the GlusterFS open source project after acquiring the Gluster company in October 2011.[1] The product was first ...more...

Member feedback about Gluster:

Software companies based in the San Francisco B...

Revolvy Brain (revolvybrain)

Revolvy User


Content delivery network

topic

Content delivery network

(Left) Single server distribution (Right) CDN scheme of distribution A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers. The goal is to distribute service spatially relative to end-users to provide high availability and high performance. CDNs serve a large portion of the Internet content today, including web objects (text, graphics and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social media sites. CDNs are a layer in the internet ecosystem. Content owners such as media companies and e-commerce vendors pay CDN operators to deliver their content to their end users. In turn, a CDN pays ISPs, carriers, and network operators for hosting its servers in their data centers. CDN is an umbrella term spanning different types of content delivery services: video streaming, software downloads, web and mobile content ...more...

Member feedback about Content delivery network:

Video hosting

Revolvy Brain (revolvybrain)

Revolvy User


Filesystem in Userspace

topic

Filesystem in Userspace

Filesystem in Userspace (FUSE) is a software interface for Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the actual kernel interfaces. FUSE is available for Linux, FreeBSD, OpenBSD, NetBSD (as puffs), OpenSolaris, Minix 3, Android and macOS.[2] FUSE is free software originally released under the terms of the GNU General Public License and the GNU Lesser General Public License. History The FUSE system was originally part of AVFS (A Virtual Filesystem), a filesystem implementation heavily influenced by the translator concept of the GNU Hurd.[3] FUSE was originally released under the terms of the GNU General Public License and the GNU Lesser General Public License, later also reimplemented as part of the FreeBSD base system[4] and released under the terms of Simplified BSD license. An ISC-licensed re-implementation by Sy ...more...

Member feedback about Filesystem in Userspace:

Free software programmed in C

Revolvy Brain (revolvybrain)

Revolvy User


Infinit (storage platform)

topic

Infinit (storage platform)

Infinit is a software-based storage platform that has been designed to be scalable and resilient under extreme conditions.[2] Unlike most distributed systems that rely on a master/slave model, Infinit relies on a decentralized (a.k.a. peer-to-peer) architecture which, although more complicated, does away with bottlenecks and single points of failures. In addition, Infinit promises to allow developers and operators to customize the underlying storage with a set of policies from encryption, redundancy, compression, deduplication, data placement and more. However, as of October 2016, only a few of those policies have been made available.[3] Likewise, only a single interface has been made public so far, a POSIX-compliant API that relies on FUSE on Linux/macOS and Dokan on Windows. The Infinit storage platform has been welcomed by the community in the container space[4] thanks to its multiple interfaces (block, object and file) along with the possibility to customize the platform so as to meet the needs of a cont ...more...

Member feedback about Infinit (storage platform):

Distributed file systems

Revolvy Brain (revolvybrain)

Revolvy User


Cooperative storage cloud

topic

Cooperative storage cloud

A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more. Unlike a traditional storage cloud, a cooperative does not directly employ dedicated servers for the actual storage of the data, thereby eliminating the need for a significant dedicated hardware investment. Each node in the cooperative runs specialized software which communicates with a centralized control and orchestration server, thereby allowing the node to both consume and contribute storage space to the cloud. The centralized control and orchestration server requires several orders of magnitude less resources (storage, comput ...more...

Member feedback about Cooperative storage cloud:

Cloud storage

Revolvy Brain (revolvybrain)

Revolvy User


XtreemFS

topic

XtreemFS

XtreemFS is an object-based, distributed file system for wide area networks.[1] XtreemFS' outstanding feature is full (all components) and real (all failure scenarios, including network partitions) fault tolerance, while maintaining POSIX file system semantics. Fault-tolerance is achieved by using Paxos-based lease negotiation algorithms and is used to replicate files and metadata. SSL and X.509 certificates support make XtreemFS usable over public networks. XtreemFS has been under development since early 2007. A first public release was made in August 2008. XtreemFS 1.0 was released in August 2009. The 1.0 release includes support for read-only replication with failover, data center replica maps, parallel reads and writes, and a native Windows client. The 1.1 added automatic on-close replication and POSIX advisory locks. In mid-2011, release 1.3 added read/write replication for files. Version 1.4 underwent extensive testing and is considered production-quality. An improved Hadoop integration and support for ...more...

Member feedback about XtreemFS:

Network file systems

Revolvy Brain (revolvybrain)

Revolvy User


Comparison of file transfer protocols

topic

Comparison of file transfer protocols

This article lists communication protocols that are designed for file transfer over a telecommunications network. Protocols for shared file systems—such as 9P and the Network File System—are beyond the scope of this article, as are file synchronization protocols. Protocols for packet-switched networks A packet-switched network transmits data that is divided into units called packets. A packet comprises a header (which describes the packet) and a payload (the data). The Internet is a packet-switched network, and most of the protocols in this list are designed for its protocol stack, the IP protocol suite. They use one of two transport layer protocols: the Transmission Control Protocol (TCP) or the User Datagram Protocol (UDP). In the tables below, the "Transport" column indicates which protocol(s) the transfer protocol uses at the transport layer. Some protocols designed to transmit data over UDP also use a TCP port for oversight. The "Server port" column indicates the port from which the server transmits ...more...

Member feedback about Comparison of file transfer protocols:

Lists of software

Revolvy Brain (revolvybrain)

Revolvy User



Next Page
Javascript Version
Revolvy Server https://www.revolvy.com
Revolvy Site Map