Remove Article Remove Hadoop Remove SQL
article thumbnail

An Introduction to Data Analysis using Spark SQL

Analytics Vidhya

This article was published as a part of the Data Science Blogathon Introduction Spark is an analytics engine that is used by data scientists all over the world for Big Data Processing. It is built on top of Hadoop and can process batch as well as streaming data. Hadoop is a framework for distributed computing that […].

article thumbnail

Performance Tuning Practices in Hive

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Apache Hive is a data warehouse system built on top of Hadoop which gives the user the flexibility to write complex MapReduce programs in form of SQL- like queries.

Hadoop 333
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Introduction to Partitioned hive table and PySpark

Analytics Vidhya

This article was published as a part of the Data Science Blogathon What is the need for Hive? The official description of Hive is- ‘Apache Hive data warehouse software project built on top of Apache Hadoop for providing data query and analysis.

article thumbnail

An Overview on DDL Commands in Apache Hive

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is the most used open-source framework in the industry to store and process large data efficiently. Hive is built on the top of Hadoop for providing data storage, query and processing capabilities.

article thumbnail

Getting Started with NoSQL Database Called HBase

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. It is developed as a part of the Hadoop ecosystem and runs on top of HDFS. HBase is an open-source non-relational, scalable, distributed database written in Java. It provides random real-time read and write access to the given data. It is possible to […].

Database 287
article thumbnail

Partitioning and Bucketing in Hive

Analytics Vidhya

This article was published as a part of the Data Science Blogathon. Introduction Hive is a popular data warehouse built on top of Hadoop that is used by companies like Walmart, Tiktok, and AT&T. It is an important technology for data engineers to learn and master. It uses a declarative language called HQL, also known […].

article thumbnail

How to become a data scientist – Key concepts to master data science

Data Science Dojo

Python, R, and SQL: These are the most popular programming languages for data science. Hadoop and Spark: These are like powerful computers that can process huge amounts of data quickly. Python, R, and SQL: These are the most popular programming languages for data science. Statistics provides the language to do this effectively.