Data Engineering, Data Science and Hadoop

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Data Science Dojo

OCTOBER 31, 2024

Remote work quickly transitioned from a perk to a necessity, and data science—already digital at heart—was poised for this change. For data scientists, this shift has opened up a global market of remote data science jobs, with top employers now prioritizing skills that allow remote professionals to thrive.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Integration of Python with Hadoop and Spark

Analytics Vidhya

MAY 30, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Introduction Big data is the collection of data that is vast. The post Integration of Python with Hadoop and Spark appeared first on Analytics Vidhya.

Hadoop

Hadoop Python Big Data Big Data

An Introduction to Hadoop Ecosystem for Big Data

Analytics Vidhya

MAY 27, 2022

This article was published as a part of the Data Science Blogathon. Introduction Every day the internet generates billions of bytes of data. Every time you put on a dog filter, watch cat videos or order food from your favourite restaurant, you generate data.

Hadoop

Hadoop Big Data Big Data Data Science

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK

Analytics Vidhya

MAY 30, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon Different components in the Hadoop Framework Introduction Hadoop is. The post HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK appeared first on Analytics Vidhya.

Hadoop

Hadoop Data Warehouse Data Science Analytics

Hadoop Ecosystem

Analytics Vidhya

OCTOBER 9, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is an open-source framework designed to facilitate interaction with big data. Still, for those unfamiliar with this technology, one question arises, what is big data?

Hadoop

Hadoop Apache Hadoop Big Data Big Data

Frequent Itemset Mining Using MapReduce on Hadoop

Analytics Vidhya

SEPTEMBER 14, 2022

This article was published as a part of the Data Science Blogathon. Introduction Every Data Science enthusiast’s journey goes through one of the most classical data problems – Frequent Itemset Mining, also sometimes referred to as Association Rule Mining or Market Basket Analysis.

Hadoop

Hadoop Data Science Analytics Analytics

Data Science Blogathon 30th Edition- Women in Data Science

Analytics Vidhya

MARCH 8, 2023

The Biggest Data Science Blogathon is now live! Martin Uzochukwu Ugwu Analytics Vidhya is back with the largest data-sharing knowledge competition- The Data Science Blogathon. Knowledge is power. Sharing knowledge is the key to unlocking that power.”―

Data Science

Data Science Analytics Analytics Apache Hadoop

Introduction to Apache Sqoop

Analytics Vidhya

JULY 25, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Sqoop is a big data engine for transferring data between Hadoop and relational database servers. Big Data Sqoop can also be […].

Hadoop

Hadoop Big Data Big Data Data Engineer

Step-by-Step Roadmap to Become a Data Engineer in 2023

Analytics Vidhya

JANUARY 2, 2023

While not all of us are tech enthusiasts, we all have a fair knowledge of how Data Science works in our day-to-day lives. All of this is based on Data Science which is […]. The post Step-by-Step Roadmap to Become a Data Engineer in 2023 appeared first on Analytics Vidhya.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Learn Everything about MapReduce Architecture & its Components

Analytics Vidhya

JULY 5, 2022

This article was published as a part of the Data Science Blogathon. Introduction MapReduce is part of the Apache Hadoop ecosystem, a framework that develops large-scale data processing. Other components of Apache Hadoop include Hadoop Distributed File System (HDFS), Yarn, and Apache Pig.

Apache Hadoop

Apache Hadoop Hadoop Data Science Algorithm

Data Science Blogathon 28th Edition

Analytics Vidhya

JANUARY 8, 2023

Hey, are you the data science geek who spends hours coding, learning a new language, or just exploring new avenues of data science? The post Data Science Blogathon 28th Edition appeared first on Analytics Vidhya. If all of these describe you, then this Blogathon announcement is for you!

Data Science

Data Science Analytics Analytics Hadoop

Workings of Hadoop Distributed File System (HDFS)

Analytics Vidhya

MAY 5, 2022

This article was published as a part of the Data Science Blogathon. Introduction This article will discuss the Hadoop Distributed File System, its features, components, functions, and benefits. Hadoop is a powerful platform for supporting an enormous variety of data applications.

Hadoop

Hadoop Data Science Analytics Analytics

Data Science Blogathon 26th Edition

Analytics Vidhya

NOVEMBER 7, 2022

Hello, fellow data science enthusiasts, did you miss imparting your knowledge in the previous blogathon due to a time crunch? Well, it’s okay because we are back with another blogathon where you can share your wisdom on numerous data science topics and connect with the community of fellow enthusiasts.

Data Science

Data Science Analytics Analytics Hadoop

Get to Know Apache Flume from Scratch!

Analytics Vidhya

MAY 12, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Flume, a part of the Hadoop ecosystem, was developed by Cloudera. Initially, it was designed to handle log data solely, but later, it was developed to process event data. The post Get to Know Apache Flume from Scratch!

Hadoop

Hadoop Data Science Analytics Analytics

Mr. Pavan’s Data Engineering Journey Drives Business Success

Analytics Vidhya

JUNE 24, 2023

He is an experienced data engineer with a passion for problem-solving and a drive for continuous growth. Thus, providing valuable insights into the field of data engineering. Introduction We had an amazing opportunity to learn from Mr. Pavan.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Partitioning and Bucketing in Hive

Analytics Vidhya

JUNE 30, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hive is a popular data warehouse built on top of Hadoop that is used by companies like Walmart, Tiktok, and AT&T. It is an important technology for data engineers to learn and master.

Data Warehouse

Data Warehouse Hadoop Data Engineer Data Engineering

A Brief Introduction to Apache HBase and it’s Architecture

Analytics Vidhya

OCTOBER 12, 2022

This article was published as a part of the Data Science Blogathon. Introduction Since the 1970s, relational database management systems have solved the problems of storing and maintaining large volumes of structured data.

Hadoop

Hadoop Big Data Big Data Data Science

Most Frequently Asked Apache HBase Interview Questions

Analytics Vidhya

AUGUST 1, 2022

This article was published as a part of the Data Science Blogathon. Introduction HBase is a column-oriented non-relational database management system that operates on Hadoop Distributed File System (HDFS). HBase provides a fault-tolerant manner of storing sparse data sets, which are prevalent in several big data use cases.

Hadoop

Hadoop Big Data Big Data Data Science

An Introduction to MapReduce with a Word Count Example

Analytics Vidhya

MAY 18, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hadoop facilitates the processing of large datasets in a distributed manner and provides the foundation on which other services and applications can be built. MapReduce and HDFS are the two main components of Hadoop.

Hadoop

Hadoop Data Science Analytics Analytics

Introduction to Partitioned hive table and PySpark

Analytics Vidhya

OCTOBER 28, 2021

This article was published as a part of the Data Science Blogathon What is the need for Hive? The official description of Hive is- ‘Apache Hive data warehouse software project built on top of Apache Hadoop for providing data query and analysis.

Apache Hadoop

Apache Hadoop Data Warehouse Hadoop SQL

Warehouse, Lake or a Lakehouse – What’s Right for you?

Analytics Vidhya

OCTOBER 10, 2022

This article was published as a part of the Data Science Blogathon. Introduction Most of you would know the different approaches for building a data and analytics platform. You would have already worked on systems that used traditional warehouses or Hadoop-based data lakes. Selecting one among […].

Data Lakes

Data Lakes Hadoop Data Science Analytics

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Data engineering tools are software applications or frameworks specifically designed to facilitate the process of managing, processing, and transforming large volumes of data. Essential data engineering tools for 2023 Top 10 data engineering tools to watch out for in 2023 1.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

An Overview on DDL Commands in Apache Hive

Analytics Vidhya

APRIL 29, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is the most used open-source framework in the industry to store and process large data efficiently. Hive is built on the top of Hadoop for providing data storage, query and processing capabilities.

Apache Hadoop

Apache Hadoop Hadoop SQL Data Science

What is Apache Impala- Features and Architecture

Analytics Vidhya

AUGUST 17, 2022

This article was published as a part of the Data Science Blogathon. Introduction Impala is an open-source and native analytics database for Hadoop. Vendors such as Cloudera, Oracle, MapReduce, and Amazon have shipped Impala. If you want to learn all things Impala, you’ve come to the right place.

Hadoop

Hadoop Data Science Database Analytics

Apache Zookeeper Architecture and Installation

Analytics Vidhya

AUGUST 3, 2022

This article was published as a part of the Data Science Blogathon. Introduction Zookeeper in Hadoop can be considered a centralized repository where distributed applications can put data into and retrieve data from. For clarity, Zookeeper can be […].

Hadoop

Hadoop Data Science Analytics Analytics

Coding vs Data Science: A comprehensive guide to unraveling the differences

Data Science Dojo

JULY 7, 2023

In the technology-driven world we inhabit, two skill sets have risen to prominence and are a hot topic: coding vs data science. Coding vs Data Science Coding goes beyond just software creation, impacting fields as diverse as healthcare, finance, and entertainment. What is Data Science?

Data Science

Data Science Data Scientist Python Decision Trees

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Rockets legacy data science environment challenges Rockets previous data science solution was built around Apache Spark and combined the use of a legacy version of the Hadoop environment and vendor-provided Data Science Experience development tools.

Data Science

Data Science AWS Hadoop Data Scientist

Apache Pig Architecture and Execution Modes

Analytics Vidhya

JULY 10, 2022

This article was published as a part of the Data Science Blogathon. The Apache Pig is built on top of Hadoop. Provides a stream of data processing for large data sets. Apache Pork offers a high-quality language. It is another way of quoting more than Reduce Map (MR).

Hadoop

Hadoop Data Science Analytics Analytics

Most Asked Interview Questions on Apache Spark

Analytics Vidhya

AUGUST 26, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark’s in-memory data processing capabilities make it 100 times faster than Hadoop. The most […].

Hadoop

Hadoop Data Science Analytics Analytics

Big data engineering simplified: Exploring roles of distributed systems

Data Science Dojo

JULY 24, 2023

They allow data processing tasks to be distributed across multiple machines, enabling parallel processing and scalability. It involves various technologies and techniques that enable efficient data processing and retrieval. Stay tuned for an insightful exploration into the world of Big Data Engineering with Distributed Systems!

Big Data

Big Data Big Data Data Engineer Data Engineering

A Guide to Choose the Best Data Science Bootcamp

Data Science Dojo

JULY 3, 2024

Data science bootcamps are intensive short-term educational programs designed to equip individuals with the skills needed to enter or advance in the field of data science. They cover a wide range of topics, ranging from Python, R, and statistics to machine learning and data visualization.

Data Science

Data Science Machine Learning Machine Learning Data Visualization

Business Analytics vs Data Science: Which One Is Right for You?

Pickl AI

DECEMBER 25, 2024

Summary: Business Analytics focuses on interpreting historical data for strategic decisions, while Data Science emphasizes predictive modeling and AI. Introduction In today’s data-driven world, businesses increasingly rely on analytics and insights to drive decisions and gain a competitive edge.

Data Science

Data Science Analytics Analytics Data Scientist

Basic Concept Behind Apache Hive and Elasticsearch

Analytics Vidhya

SEPTEMBER 4, 2022

This article was published as a part of the Data Science Blogathon. Introduction I’ve always wondered how big companies like Google process their information or how companies like Netflix can perform searches in concise times.

Data Science

Data Science Analytics Analytics Hadoop

Apache Flume Interview Questions

Analytics Vidhya

JULY 27, 2022

This article was published as a part of the Data Science Blogathon. Introduction to Apache Flume Apache Flume is a data ingestion mechanism for gathering, aggregating, and transmitting huge amounts of streaming data from diverse sources, such as log files, events, and so on, to a centralized data storage.

Data Science

Data Science Analytics Analytics Hadoop

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

Data engineering is a crucial field that plays a vital role in the data pipeline of any organization. It is the process of collecting, storing, managing, and analyzing large amounts of data, and data engineers are responsible for designing and implementing the systems and infrastructure that make this possible.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

Types of Tables in Apache Hive – A Quick Overview

Analytics Vidhya

OCTOBER 23, 2020

Overview Apache Hive is a must-know tool for anyone interested in data science and data engineering Learn about the different types of tables un. The post Types of Tables in Apache Hive – A Quick Overview appeared first on Analytics Vidhya.

Data Engineer

Data Engineer Data Engineering Data Engineering Data Engineering

An Introduction to Apache Pig For Absolute Beginners!

Analytics Vidhya

AUGUST 8, 2021

ArticleVideo Book This article was published as a part of the Data Science Blogathon This article is focused on Apache Pig. It is a high-level. The post An Introduction to Apache Pig For Absolute Beginners! appeared first on Analytics Vidhya.

Data Science

Data Science Analytics Analytics Hadoop

Getting Your First Job in Data Science

Data Science 101

JUNE 10, 2019

Getting your first data science job might be challenging, but it’s possible to achieve this goal with the right resources. Before jumping into a data science career , there are a few questions you should be able to answer: How do you break into the profession? What skills do you need to become a data scientist?

Data Science

Data Science Data Scientist Data Analyst Data Engineering

Getting started with Apache Pig!

Analytics Vidhya

JUNE 24, 2022

This article was published as a part of the Data Science Blogathon. Apache Pig is capable of working on any kind of data, similar to a pig who can eat anything. Introduction After reading the heading Apache Pig, the first question that hits every mind is, why the word Pig? Pig is nothing but a […].

Data Science

Data Science Analytics Analytics Hadoop

A beginner tale of Data Science

Becoming Human

JANUARY 23, 2023

Data Science You heard this term most of the time all over the internet, as well this is the most concerning topic for newbies who want to enter the world of data but don’t know the actual meaning of it. I’m not saying those are incorrect or wrong even though every article has its mindset behind the term ‘ Data Science ’.

Data Science

Data Science Big Data Big Data Deep Learning

Introduction to Apache Kafka: Fundamentals and Working

Analytics Vidhya

DECEMBER 30, 2022

This article was published as a part of the Data Science Blogathon. Introduction Have you ever wondered how Instagram recommends similar kinds of reels while you are scrolling through your feed or ad recommendations for similar products that you were browsing on Amazon?

Apache Kafka

Apache Kafka Data Science Analytics Analytics

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Integration of Python with Hadoop and Spark

Webinars

Trending Sources

An Introduction to Hadoop Ecosystem for Big Data

Webinars

HIVE – A DATA WAREHOUSE IN HADOOP FRAMEWORK

Hadoop Ecosystem

Frequent Itemset Mining Using MapReduce on Hadoop

Data Science Blogathon 30th Edition- Women in Data Science

Introduction to Apache Sqoop

Step-by-Step Roadmap to Become a Data Engineer in 2023

Learn Everything about MapReduce Architecture & its Components

Data Science Blogathon 28th Edition

Workings of Hadoop Distributed File System (HDFS)

Data Science Blogathon 26th Edition

Get to Know Apache Flume from Scratch!

Mr. Pavan’s Data Engineering Journey Drives Business Success

Partitioning and Bucketing in Hive

A Brief Introduction to Apache HBase and it’s Architecture

Most Frequently Asked Apache HBase Interview Questions

An Introduction to MapReduce with a Word Count Example

Introduction to Partitioned hive table and PySpark

Warehouse, Lake or a Lakehouse – What’s Right for you?

Top Interview Questions & Answers for Apache Oozie

Essential data engineering tools for 2023: Empowering for management and analysis

An Overview on DDL Commands in Apache Hive

What is Apache Impala- Features and Architecture

Top 20 Apache Oozie Interview Questions

Apache Zookeeper Architecture and Installation

Coding vs Data Science: A comprehensive guide to unraveling the differences

How Rocket Companies modernized their data science solution on AWS

Apache Pig Architecture and Execution Modes

Most Asked Interview Questions on Apache Spark

Big data engineering simplified: Exploring roles of distributed systems

A Guide to Choose the Best Data Science Bootcamp

Business Analytics vs Data Science: Which One Is Right for You?

Top Interview Questions & Answers for Apache Sqoop

Basic Concept Behind Apache Hive and Elasticsearch

Apache Flume Interview Questions

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Types of Tables in Apache Hive – A Quick Overview

An Introduction to Apache Pig For Absolute Beginners!

Getting Your First Job in Data Science

Getting started with Apache Pig!

A beginner tale of Data Science

Introduction to Apache Kafka: Fundamentals and Working

Stay Connected