We live in an era of rapidly advancing technology and Big Data. The use of smart phones, tablets and other gadgets is reaching saturation in many markets. About half of the world’s population has access to the internet. The popularity of social networking is spreading; for instance, Facebook has nearly two billion monthly active users.
Consequently, the volume of data we generate is growing exponentially. Factoring in the growth of IoT applications , it’s projected that by 2025, the world will double the amount of data produced every 12 months. Now that’s Big Data!
What’s Big Data?
Big Data is a generic term used to describe huge amounts of data - structured, semi-structured or unstructured. It refers to data that is measured in petabytes or more.
Sources of Big Data
The following list includes some of the primary sources that are generating large volumes of data in various forms.
- Music downloads and streaming
- Traffic patterns
- Medical records
- Stock exchanges
- Social networking services such as and Facebook, LinkedIn, Snapchat and Twitter generate large volumes of data as users upload images, text and videos. Facebook alone generates over 500 terabytes of data daily.
- Vehicles, with their hundreds of sensors which will increase with the introduction of autonomous driving systems.
- Aircraft, with their thousands of sensors, multiplied by over 100,000 flights a day worldwide.
The list goes on. Organizations have very large data sets in different forms which increase the complexity of managing Big Data. Stiff competition amongst these organizations increases the need to provide quick responses to customers in order to provide great user experiences and attract more customers. So, it is better to organize the data in a distributed way, providing more scalability and making it highly available and providing quick response times.
Relational database management system (RDBMS) are not able to meet the performance, scalability and flexibility that next-generation data-intensive applications require. Traditional database systems and RDBMS can handle structured data where the table structure is defined in advance. However, they cannot handle unstructured data, where the format of the data is not fixed. That’s because one instance of an entity is available in one format and another instance of the same entity is available in a different format.
Unstructured data is growing far more rapidly than structured data. That is why databases are becoming more schema-less and moving away from traditional schema-full architectures. All this can be provided by a NoSQL (not only SQL) database seamlessly with cloud. That’s because NoSQL can easily handle both structured and unstructured data.
Unstructured data includes:
- User and session data
- Videos and images
- Time-series data from IoT devices
NoSQL can handle the three Vs
- Volume: Increasing database size, measured in petabytes
- Velocity: Quick generation of data
- Variety of Big Data: Structured, semi-structured and unstructured
The four categories of NoSQL
- Document: Databases such as Cloudant, CouchDB and MongoDB
- Key value: Coherence, Memcached and Redis
- Column family: Google Bigtable, Apache HBASE, and Cassandra
- Graph database: Neo4j
Because NoSQL means not only SQL, it can support SQL-like languages and other query languages that are used to retrieve data. Thus, NoSQL is revolutionary in how data is stored and managed.
Because NoSQL doesn’t provide all the ACID (Atomicity, Consistency, Isolation, Durability) properties-but consistency in other form with performance, scalability and high availability. it is not well suited for real-time applications. But the applications where the user may see different types of data at different times can accept it. An example of this is social media, where a person uploads an image but is not able to view the new image immediately. Known as eventual consistency, that would be acceptable here.
Most NoSQL databases lack the ability to join. So, queries fired on a NoSQL database are generally simple. Several queries are run to get the desired result. Also, NoSQL is built with a distributed architecture with no single point of failure.
What’s not in NoSQL
- No joins support. Relational databases have joins support so they are not very scalable. NoSQL does not use joins so it is very scalable and high performing.
- No support for complex transactions.
- No constraint support.
- Not all the ACID properties are supported.
Thus, transaction support and constraint support must be implemented at the application level.
When to use NoSQL
- When storing and retrieving large amounts of data.
- Where storing relationships between the elements is not important.
- When dealing with a growing list of elements: Twitter posts, internet server logs, blogs, etc.
- When data is not structured or it’s changing rapidly. For example, a database table may have five attributes today, but can quickly increase to, say, 15 attributes, with the number of columns growing even further.
When not to use NoSQL
- When joins are needed.
- RDBMS is better suited when working with bank transfer applications and a transaction is required.
Aricent offers domain-specific Big Data and Analytics enabling software to help our clients accelerate the development of Big Data and Analytics products and solutions. Our enabling software empowers our clients to solve critical business challenges and gain insights to make decisions efficiently and effectively. Click here to talk to our experts.