NoSQL Glossary

Unlock all answers in this set

Unlock answers
question
BigTable
answer
BigTable is Google's proprietary NoSQL database, although it also can refer to a NoSQL database architecture. BigTable databases have many tables, each of which has many rows. Unlike a relational database, rows in a BigTable database may contain thousands of columns, compound columns, multiple row versions, and columns do not need to be predefined. The basics of the BigTable architecture are explained in this white paper from Google.
question
Cassandra
answer
An Open Source database originally developed at Facebook. Cassandra combines architectural elements from BigTable and Dynamo to create a decentralized, massively scalable database.
question
Cluster
answer
A collection of sharded servers. The physical organization of a cluster varies from implementation to implementation.
question
Columnar database
answer
In a columnar database, data is stored by column rather than by row. This model is advantageous for working with aggregations of data, systems that perform mass updates. Columnar databases excel at analytical processing.
question
Compaction Compaction
answer
is the process of removing unused data and merging older data files. At the end of a compaction process, a set of data files will exist with only the current version of the data.
question
Document database
answer
Document databases focus on how data is accessed rather than how data is stored. Data access is optimized for discrete documents (typically an entire object graph rather than a single atomic row of data). Document-oriented databases may be physical structured as a columnar, BigTable, or key-value store; the implementation is not as important as the way data is accessed.
question
Dynamo Dynamo
answer
is a massively scaleable key-value storage system that was developed at Amazon. Dynamo provides an always-on system that uses sophisticated versioning and conflict resolution techniques to be \"self-healing\" in the event of network failures. Amazon published the details in the white paper: Dynamo: Amazon's Highly Available Key-value Store
question
Elasticity Elastic databases
answer
make it trivial to add nodes to a cluster as needed with no downtime. Read and write operations scale linearly as more machines are added.
question
Hadoop Hadoop
answer
is a framework for working with data-intensive distributed applications. Hadoop was based on Google's MapReduce paper. In addition to MapReduce functionality, Hadoop also provides location awareness and a set of common tools.
question
HBase
answer
A BigTable columnar database build on Hadoop. HBase has a large number of features that make it well suited for the enterprise (MapReduce, elastic storage, massively distributed, data compression)
question
HDFS
answer
Hadoop Distributed Files System - this is a distributed, location aware, replicated file system. HDFS has data balancing features - if any single node contains a disproportionate amount of data, the data can be easily redistributed to other nodes. Despite the name, HDFS cannot be directly mounted by an operating system (without additional, third party, libraries).
question
Key/Value Store Data
answer
is stored as an arbitrary value that is looked up via an arbitrary key. Frequent uses of key/value stores are shopping carts, session state, and other caching mechanisms
question
MapReduce
answer
An algorithm initially developed at Google for performing parallel data processing. Other MapReduce implementations and frameworks have been developed for different databases. MapReduce workloads can be spread over thousands of nodes and multiple Map and Reduce phases. A Map operation is like a SQL SELECT statement - it produces zero or more results from one or more inputs. A Reduce operation is like a SQL GROUP BY combined with aggregate functions - it combines the results of multiple map operations.
question
MongoDB
answer
is a scalable, high-performance, document-oriented database. MongoDB has a variety of features designed to bridge the gap between key/value stores and and traditional RDBMSes; some of these features are ad hoc querying, secondary indexes, replication, and aggregation.
question
Network partitioning
answer
happens when multiple parts of a cluster become separated due to some type of failure.
question
Node
answer
A single computational unit in a cluster. Typically this would be a server or computer. If there are multiple nodes on a single computer, they may be referred to as virtual nodes.
question
NoSQL
answer
A generic term reserved for any one of a variety of non-relational databases. Original it didn't mean much of anything but there have been attempts made to co-opt the term into an acronym standing for \"Not Only SQL\".
question
Replication
answer
In Dynamo based systems, data is written to multiple nodes, also called replicas. N copies of data will be stored in the system. Any time data is written, W replicas need to respond before the write is considered to be a success. Likewise, R replicas need to respond for a read to be considered a success. Replication settings can be tuned for different levels of performance, however the general rule is that R + W > N (the number of nodes for read and write should be greater than the total number of replicas).
question
Riak Riak
answer
is a key/value stored based on Amazon's Dynamo. Riak provides linear scaling, background replication, flexibility, and fault-tolerance.
question
Shard Not
answer
to be confused with chard, a shard is a segment of data. Sharding is a method of spreading data across multiple servers in a cluster to balance storage and CPU load. Shards are typically identified by a sharding key although the mechanism varies from product to product.
question
Tombstone
answer
A special value written in a database to indicate that a record should be deleted. Data marked with a tombstone will still be present in the database until a compaction occurs.
Get an explanation on any task
Get unstuck with the help of our AI assistant in seconds
New