Article, Hadoop and SQL - Data Science Current

An Introduction to Data Analysis using Spark SQL

Analytics Vidhya

AUGUST 30, 2021

This article was published as a part of the Data Science Blogathon Introduction Spark is an analytics engine that is used by data scientists all over the world for Big Data Processing. It is built on top of Hadoop and can process batch as well as streaming data. Hadoop is a framework for distributed computing that […].

Data Analysis

Data Analysis Data Analysis SQL Hadoop

Performance Tuning Practices in Hive

Analytics Vidhya

FEBRUARY 20, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hive is a data warehouse system built on top of Hadoop which gives the user the flexibility to write complex MapReduce programs in form of SQL- like queries.

Hadoop

Hadoop Data Warehouse SQL Data Science

Introduction to Partitioned hive table and PySpark

Analytics Vidhya

OCTOBER 28, 2021

This article was published as a part of the Data Science Blogathon What is the need for Hive? The official description of Hive is- ‘Apache Hive data warehouse software project built on top of Apache Hadoop for providing data query and analysis.

Apache Hadoop

Apache Hadoop Data Warehouse Hadoop SQL

Webinars

Maximizing Profit and Productivity: The New Era of AI-Powered Accounting

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

An Overview on DDL Commands in Apache Hive

Analytics Vidhya

APRIL 29, 2022

This article was published as a part of the Data Science Blogathon. Introduction Apache Hadoop is the most used open-source framework in the industry to store and process large data efficiently. Hive is built on the top of Hadoop for providing data storage, query and processing capabilities.

Apache Hadoop

Apache Hadoop Hadoop SQL Data Science

Getting Started with NoSQL Database Called HBase

Analytics Vidhya

MAY 17, 2022

This article was published as a part of the Data Science Blogathon. It is developed as a part of the Hadoop ecosystem and runs on top of HDFS. HBase is an open-source non-relational, scalable, distributed database written in Java. It provides random real-time read and write access to the given data. It is possible to […].

Database

Database Hadoop Data Science Analytics

Partitioning and Bucketing in Hive

Analytics Vidhya

JUNE 30, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hive is a popular data warehouse built on top of Hadoop that is used by companies like Walmart, Tiktok, and AT&T. It is an important technology for data engineers to learn and master. It uses a declarative language called HQL, also known […].

Data Warehouse

Data Warehouse Hadoop Data Engineering Data Engineer

How to become a data scientist – Key concepts to master data science

Data Science Dojo

AUGUST 27, 2024

Python, R, and SQL: These are the most popular programming languages for data science. Hadoop and Spark: These are like powerful computers that can process huge amounts of data quickly. Python, R, and SQL: These are the most popular programming languages for data science. Statistics provides the language to do this effectively.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

Spark Vs. Hadoop – All You Need to Know

Pickl AI

SEPTEMBER 19, 2024

Summary: This article compares Spark vs Hadoop, highlighting Spark’s fast, in-memory processing and Hadoop’s disk-based, batch processing model. Introduction Apache Spark and Hadoop are potent frameworks for big data processing and distributed computing. What is Apache Hadoop? What is Apache Spark?

Hadoop

Hadoop Big Data Big Data Clustering

A Comprehensive Guide On Apache Sqoop

Analytics Vidhya

SEPTEMBER 23, 2022

This article was published as a part of the Data Science Blogathon. Introduction Hi Everyone, In this guide, we will discuss Apache Sqoop. We will discuss the Sqoop import and export processes with different modes and also cover Sqoop-hive integration. In this guide, I will go over Apache Sqoop in depth so that whenever you […].

Data Science

Data Science Analytics Analytics Hadoop

Coding vs Data Science: A comprehensive guide to unraveling the differences

Data Science Dojo

JULY 7, 2023

At first glance, they may seem like two sides of the same coin, but a closer look reveals distinct differences and unique career opportunities. This article aims to demystify these domains, shedding light on what sets them apart, the essential skills they demand, and how to navigate a career path in either field. What is Coding?

Data Science

Data Science Data Scientist Python Decision Trees

How to become a data scientist – Key concepts to master data science

Data Science Dojo

AUGUST 27, 2024

Python, R, and SQL: These are the most popular programming languages for data science. Hadoop and Spark: These are like powerful computers that can process huge amounts of data quickly. Python, R, and SQL: These are the most popular programming languages for data science. Statistics provides the language to do this effectively.

Data Scientist

Data Scientist Data Science Machine Learning Machine Learning

A Practical Introduction to PySpark

Towards AI

SEPTEMBER 28, 2023

This article explains what PySpark is, some common PySpark functions, and data analysis of the New York City Taxi & Limousine Commission Dataset using PySpark. With PySpark, you can write Python and SQL-like commands to manipulate and analyze data in a distributed processing environment. This member-only story is on us.

Apache Hadoop

Apache Hadoop Hadoop Python SQL

Business Analytics vs Data Science: Which One Is Right for You?

Pickl AI

DECEMBER 25, 2024

This article helps you choose the right path by exploring their differences, roles, and future opportunities. Descriptive analytics is a fundamental method that summarizes past data using tools like Excel or SQL to generate reports. Big data platforms such as Apache Hadoop and Spark help handle massive datasets efficiently.

Data Science

Data Science Analytics Analytics Data Scientist

Gartner BI Bake Off: Data Catalogs and the Opioid Epidemic

Alation

FEBRUARY 20, 2020

Alation catalogs and crawls all of your data assets, whether it is in a traditional relational data set (MySQL, Oracle, etc), a SQL on Hadoop system (Presto, SparkSQL,etc), a BI visualization or something in a file system, such as HDFS or AWS S3.

SQL

SQL Hadoop Analytics Analytics

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

With expertise in programming languages like Python , Java , SQL, and knowledge of big data technologies like Hadoop and Spark, data engineers optimize pipelines for data scientists and analysts to access valuable insights efficiently. Big Data Technologies: Hadoop, Spark, etc. ETL Tools: Apache NiFi, Talend, etc.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In this article, we will delve into the concept of data lakes, explore their differences from data warehouses and relational databases, and discuss the significance of data version control in the context of large-scale data management. This is particularly advantageous when dealing with exponentially growing data volumes.

Data Lakes

Data Lakes Data Warehouse Database Big Data

8 Best Programming Language for Data Science

Pickl AI

JULY 18, 2023

There are different programming languages and in this article, we will explore 8 programming languages that play a crucial role in the realm of Data Science. SQL: Mastering Data Manipulation Structured Query Language (SQL) is a language designed specifically for managing and manipulating databases.

Data Science

Data Science SQL Data Scientist Python

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

This article explores the key fundamentals of Data Engineering, highlighting its significance and providing a roadmap for professionals seeking to excel in this vital field. Among these tools, Apache Hadoop, Apache Spark, and Apache Kafka stand out for their unique capabilities and widespread usage. million by 2028.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Top 10 Jobs in AI and the Right AI Skills

Pickl AI

JANUARY 13, 2025

This article explores the top 10 AI jobs in India and the essential skills required to excel in these roles. Proficiency in programming languages like Python and SQL. Familiarity with SQL for database management. Hadoop , Apache Spark ) is beneficial for handling large datasets effectively. million by 2027.

AI

AI AI Machine Learning Machine Learning

Beginner’s Guide To GCP BigQuery (Part 1)

Mlearning.ai

JULY 10, 2023

In my 7 years of Data Science journey, I’ve been exposed to a number of different databases including but not limited to Oracle Database, MS SQL, MySQL, EDW, and Apache Hadoop. Now let’s get into the main topic of the article. I’ll leave these methods on you to embark on your own research and familiarize yourself.

SQL

SQL Database Apache Hadoop Data Science

The Ultimate Guide to Choosing between Data Science and Data Analytics.

Mlearning.ai

MARCH 15, 2023

This article will serve as an ultimate guide to choosing between Data Science and Data Analytics. At the end of this article, you will fully understand what it entails to be a data scientist or data analyst. Before going into the main purpose of this article, what is data? What do you mean by data?

Data Science

Data Science Analytics Analytics Data Analyst

UNLOCKING THE POWER OF BIG DATA

Women in Big Data

SEPTEMBER 7, 2024

Advances in big data technology like Hadoop, Hive, Spark and Machine Learning algorithms have made it possible to interpret and utilize this variety of data effectively. Examples include Excel files, SQL databases, and data warehouses. Variety Data comes in a myriad of formats including text, images, videos, and more.

Big Data

Big Data Big Data Database Machine Learning

Data science vs. machine learning: What’s the difference?

IBM Journey to AI blog

JULY 6, 2023

The fields have evolved such that to work as a data analyst who views, manages and accesses data, you need to know Structured Query Language (SQL) as well as math, statistics, data visualization (to present the results to stakeholders) and data mining. It’s also necessary to understand data cleaning and processing techniques.

Machine Learning

Machine Learning Machine Learning Data Science Big Data

Is data science a good career? Let’s find out!

Dataconomy

JULY 25, 2023

That’s why, in this article, we’ll explore why data science is not only a good career choice but also a thriving and promising one. Big data tools : Familiarity with big data tools like Hadoop, Spark, and NoSQL databases is advantageous for handling large-scale datasets. Is data science a good career?

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Tableau vs Power BI: Which is The Better Business Intelligence Tool in 2024?

Pickl AI

NOVEMBER 5, 2024

This article compares Tableau and Power BI, examining their features, pricing, and suitability for different organisations. This article will guide readers in selecting the right BI tool—Tableau or Power BI—for their needs in 2024. Tableau supports many data sources, including cloud databases, SQL databases, and Big Data platforms.

Power BI

Power BI Tableau Business Intelligence Business Intelligence

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

DataSeries

AUGUST 15, 2024

In this article, let’s understand an explanation of how to enhance problem-solving skills as a data engineer. Hadoop, Spark). Hadoop, Spark). Practice coding with the help of languages that are used in data engineering like Python, SQL, Scala, or Java.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

What Industries are Hiring for Different Jobs in AI

ODSC - Open Data Science

APRIL 26, 2023

Because they are the most likely to communicate data insights, they’ll also need to know SQL, and visualization tools such as Power BI and Tableau as well. Like their counterparts in the machine learning world, engineers need to know a variety of scripted languages such as SQL for database management, Scala, Java, and of course Python.

Data Analyst

Data Analyst Machine Learning Machine Learning Power BI

What Is a Data Fabric and How Does a Data Catalog Support It?

Alation

JANUARY 25, 2022

For instance, technical power users can explore the actual data through Compose , the intelligent SQL editor. Those less familiar with SQL can search for technical terms using natural language. The data catalog supports human understanding by surfacing useful metadata (like usage statistics, conversations, and wiki-like articles).

DataOps

DataOps SQL ML ML

Top highest paying data science cities in India

Pickl AI

JULY 24, 2023

This article throws light on the 10 best cities for Data Scientists offering the best Data Science salary in India. With the abundance of data available, organizations across various industries are leveraging data science to gain valuable insights and make informed decisions. What is Data Science?

Data Science

Data Science Data Scientist Machine Learning Machine Learning

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

In this article, we will discuss the importance of data versioning control in machine learning and explore various methods and tools for implementing it with different types of data sources. LakeFS is fully compatible with many ecosystems of data engineering tools such as AWS, Azure, Spark, Databrick, MlFlow, Hadoop and others.

ML

ML ML Data Lakes Machine Learning

Data platform trinity: Competitive or complementary?

IBM Journey to AI blog

JANUARY 18, 2023

This article endeavors to alleviate those confusions. It can include technologies that range from Oracle, Teradata and Apache Hadoop to Snowflake on Azure, RedShift on AWS or MS SQL in the on-premises data center, to name just a few. While this is encouraging, it is also creating confusion in the market.

Data Lakes

Data Lakes Data Warehouse Azure Apache Hadoop

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

This article will discuss managing unstructured data for AI and ML projects. Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. Popular data lake solutions include Amazon S3 , Azure Data Lake , and Hadoop.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

Data Quality Framework: What It Is, Components, and Implementation

DagsHub

AUGUST 23, 2024

This article will guide you through the concept of a data quality framework, its essential components, and how to implement it effectively within your organization. Other Apache Griffin is an open-source data quality solution for big data environments, particularly within the Hadoop and Spark ecosystems.

Data Quality

Data Quality Data Governance Machine Learning Machine Learning

Data Science Current

An Introduction to Data Analysis using Spark SQL

Performance Tuning Practices in Hive

Webinars

Trending Sources

Introduction to Partitioned hive table and PySpark

Webinars

An Overview on DDL Commands in Apache Hive

Getting Started with NoSQL Database Called HBase

Partitioning and Bucketing in Hive

How to become a data scientist – Key concepts to master data science

Spark Vs. Hadoop – All You Need to Know

A Comprehensive Guide On Apache Sqoop

Coding vs Data Science: A comprehensive guide to unraveling the differences

How to become a data scientist – Key concepts to master data science

A Practical Introduction to PySpark

Business Analytics vs Data Science: Which One Is Right for You?

Top Big Data Interview Questions for 2025

Gartner BI Bake Off: Data Catalogs and the Opioid Epidemic

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Data Version Control for Data Lakes: Handling the Changes in Large Scale

8 Best Programming Language for Data Science

Discover the Most Important Fundamentals of Data Engineering

Top 10 Jobs in AI and the Right AI Skills

Beginner’s Guide To GCP BigQuery (Part 1)

The Ultimate Guide to Choosing between Data Science and Data Analytics.

UNLOCKING THE POWER OF BIG DATA

Data science vs. machine learning: What’s the difference?

Is data science a good career? Let’s find out!

Tableau vs Power BI: Which is The Better Business Intelligence Tool in 2024?

Why Improving Problem-Solving Skills is Crucial for Data Engineers?

What Industries are Hiring for Different Jobs in AI

What Is a Data Fabric and How Does a Data Catalog Support It?

Top highest paying data science cities in India

How to Version Control Data in ML for Various Data Sources

Data platform trinity: Competitive or complementary?

How to Manage Unstructured Data in AI and Machine Learning Projects

Data Quality Framework: What It Is, Components, and Implementation

Stay Connected