Remove Article Remove Database Remove Hadoop
article thumbnail

Getting Started with Big Data & Hadoop

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction on Big Data & Hadoop The amount of data in our world is growing exponentially. The post Getting Started with Big Data & Hadoop appeared first on Analytics Vidhya. It is estimated that at least 2.5

Hadoop 270
article thumbnail

Getting Started with NoSQL Database Called HBase

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. HBase is an open-source non-relational, scalable, distributed database written in Java. It is developed as a part of the Hadoop ecosystem and runs on top of HDFS. The post Getting Started with NoSQL Database Called HBase appeared first on Analytics Vidhya.

Database 287
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introduction to Apache Sqoop

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Apache Sqoop is a big data engine for transferring data between Hadoop and relational database servers. Sqoop transfers data from RDBMS (Relational Database Management System) such as MySQL and Oracle to HDFS (Hadoop Distributed File System).

Hadoop 353
article thumbnail

A Brief Introduction to Apache HBase and it’s Architecture

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Since the 1970s, relational database management systems have solved the problems of storing and maintaining large volumes of structured data. The post A Brief Introduction to Apache HBase and it’s Architecture appeared first on Analytics Vidhya.

Hadoop 353
article thumbnail

Most Frequently Asked Apache HBase Interview Questions

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction HBase is a column-oriented non-relational database management system that operates on Hadoop Distributed File System (HDFS). HBase provides a fault-tolerant manner of storing sparse data sets, which are prevalent in several big data use cases.

Hadoop 360
article thumbnail

Introduction to Partitioned hive table and PySpark

Analytics Vidhya

This article was published as a part of the Data Science Blogathon What is the need for Hive? The official description of Hive is- ‘Apache Hive data warehouse software project built on top of Apache Hadoop for providing data query and analysis.

article thumbnail

What is Apache Impala- Features and Architecture

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Impala is an open-source and native analytics database for Hadoop. Vendors such as Cloudera, Oracle, MapReduce, and Amazon have shipped Impala. If you want to learn all things Impala, you’ve come to the right place.

Hadoop 291