Google File Systems Essay Example
Google File Systems Essay Example

Google File Systems Essay Example

Available Only on StudyHippo
Topics:
  • Pages: 3 (625 words)
  • Published: August 14, 2021
View Entire Sample
Text preview

Each chunkserver in GFS utilizes checksumming to identify data corruption. Given the large amount of stored data in GFS, there is a likelihood of experiencing similar failures.

It frequently encounters disk failures, resulting in data corruption or loss of data during both reading and writing processes. Nonetheless, the data can be recovered by utilizing additional replicas of the chunks. Although identical replicas are not guaranteed, each chunkserver must independently check the integrity of its own copy by preserving checksums. Each chunk is divided into 64 KB blocks and has a corresponding 32-bit checksum. When performing reads, the chunkserver verifies the checksum of data blocks that overlap with the requested range before providing any data to the requester, whether it is a client or another chunkserver.

So, in order to prevent corruptions from spreading to other machi

...

nes, chunk-servers do not propagate them. However, when a write operation replaces an existing range of a chunk, it is necessary to read and verify the first and last blocks of the range being overwritten. After the write is completed, a computation is carried out to update the checksums. Diagnostic tools, like logging, play a vital role in identifying issues across the system. These tools are advantageous for both the overall system performance and the company as they come with minimal costs.

These logs document various significant events and all RPC requests and replies. An advantage of logging is that logs can be easily removed without impacting the accuracy of the system. The RPC logs comprise of requests and responses transmitted over the network, excluding file data being read or written. By correlating requests with replies, the complete interaction history can be reconstructed

View entire sample
Join StudyHippo to see entire essay

Initially, Google file systems were designed for backend production systems and gradually evolved to cater to research and development tasks.

This file system is designed to guarantee data reliability for Google's core data and usage needs. Its main purpose is to support Google's market-leading internet search engine, which handles an average of 40,000 search queries per second according to internet live stats. This translates to roughly 3.5 billion searches per day, showcasing the significant amount of data managed and stored by this file system. Furthermore, other applications like Google Drive, Gmail, and Google Maps also utilize this file system, although their usage levels are not comparable to that of the search engine.

The primary objective of this system is to maximize the success of the search engine, even though it may not always align semantically with other applications. This discrepancy can affect performance in applications that assume a read once write once workload. To understand how this system operates, let's explain a basic write operation. Initially, the application sends both the file name and data to the GFS client. Subsequently, the client forwards the file name and chunk index to the master. The master then transmits information about identity and other replicas to the client, who caches this information. To enhance performance, the client sends data to all replicas and stores it in an LRU buffer until it is ready for use.

Once all client data is received by the replicas, a write request is sent to the primary. The primary then forwards this write request to all secondary replicas. Once this process is complete, the primary notifies the client of its completion. If there is a failure,

the primary informs the client and it becomes the responsibility of the client to retry the mutation. On the other hand, a read operation is simpler as only the file name and byte range need to be provided to the GFS client.

When the client contacts the master with the file name and chunk index, the master responds by returning the chunk handle and replication locations to the client.

Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New