Data Lakes, Data Scientist and SQL - Data Science Current

Data lakes vs. data warehouses: Decoding the data storage debate

Data Science Dojo

JANUARY 12, 2023

When it comes to data, there are two main types: data lakes and data warehouses. What is a data lake? An enormous amount of raw data is stored in its original format in a data lake until it is required for analytics applications. Which one is right for your business?

Data Lakes

Data Lakes Data Warehouse Hadoop Machine Learning

Data Version Control for Data Lakes: Handling the Changes in Large Scale

ODSC - Open Data Science

SEPTEMBER 27, 2023

In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident.

Data Lakes

Data Lakes Data Warehouse Database Big Data

Simplifying Time Series Analysis for Data Scientists

ODSC - Open Data Science

SEPTEMBER 12, 2023

Most data scientists are familiar with the concept of time series data and work with it often. The time series database (TSDB) , however, is still an underutilized tool in the data science community. Typically, time series analysis is performed either on CSV files or data lakes.

Data Scientist

Data Scientist Database Data Lakes Data Science

Webinars

Automation, Evolved: Your New Playbook For Smarter Knowledge Work

MORE WEBINARS

Imperva optimizes SQL generation from natural language using Amazon Bedrock

AWS Machine Learning Blog

JUNE 20, 2024

Our goal was to improve the user experience of an existing application used to explore the counters and insights data. The data is stored in a data lake and retrieved by SQL using Amazon Athena. The following figure shows a search query that was translated to SQL and run. The challenge is to assure quality.

SQL

SQL Database AWS Machine Learning

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Flipboard

DECEMBER 11, 2024

Many of these applications are complex to build because they require collaboration across teams and the integration of data, tools, and services. Data engineers use data warehouses, data lakes, and analytics tools to load, transform, clean, and aggregate data. option("multiLine", "true").option("header",

SQL

SQL AWS Data Lakes AI

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

Data Science Dojo

SEPTEMBER 11, 2024

With this full-fledged solution, you don’t have to spend all your time and effort combining different services or duplicating data. Overview of One Lake Fabric features a lake-centric architecture, with a central repository known as OneLake. Now, we can save the data as delta tables to use later for sales analytics.

Power BI

Power BI Data Pipeline Data Warehouse Data Engineer

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks. Apache Hive was used to provide a tabular interface to data stored in HDFS, and to integrate with Apache Spark SQL. This also led to a backlog of data that needed to be ingested.

Data Science

Data Science AWS Hadoop Data Scientist

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

Pickl AI

NOVEMBER 15, 2023

Discover the nuanced dissimilarities between Data Lakes and Data Warehouses. Data management in the digital age has become a crucial aspect of businesses, and two prominent concepts in this realm are Data Lakes and Data Warehouses. It acts as a repository for storing all the data.

Data Lakes

Data Lakes Data Warehouse Database ETL

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Flipboard

DECEMBER 18, 2023

Swami Sivasubramanian, Vice President of AWS Data and Machine Learning Watch Swami Sivasubramanian, Vice President of Data and AI at AWS, to discover how you can use your company data to build differentiated generative AI applications and accelerate productivity for employees across your organization.

AWS

AWS Data Warehouse ETL SQL

Data Science News from Microsoft Ignite 2019

Data Science 101

NOVEMBER 7, 2019

Azure Synapse Analytics can be seen as a merge of Azure SQL Data Warehouse and Azure Data Lake. Synapse allows one to use SQL to query petabytes of data, both relational and non-relational, with amazing speed. I have not gotten a chance to try it out yet, so I am not sure its usecase for data science yet.

Data Science

Data Science Azure SQL Machine Learning

Drowning in Data? A Data Lake May Be Your Lifesaver

ODSC - Open Data Science

SEPTEMBER 29, 2023

Data management problems can also lead to data silos; disparate collections of databases that don’t communicate with each other, leading to flawed analysis based on incomplete or incorrect datasets. One way to address this is to implement a data lake: a large and complex database of diverse datasets all stored in their original format.

Data Lakes

Data Lakes Clustering Big Data Big Data

Your Complete Roadmap to Become an Azure Data Scientist

Pickl AI

SEPTEMBER 5, 2024

Summary: This blog provides a comprehensive roadmap for aspiring Azure Data Scientists, outlining the essential skills, certifications, and steps to build a successful career in Data Science using Microsoft Azure. This roadmap aims to guide aspiring Azure Data Scientists through the essential steps to build a successful career.

Azure

Azure Data Scientist Data Science Machine Learning

40 Must-Know Data Science Skills and Frameworks for 2023

ODSC - Open Data Science

FEBRUARY 2, 2023

The role of a data scientist is in demand and 2023 will be no exception. To get a better grip on those changes we reviewed over 25,000 data scientist job descriptions from that past year to find out what employers are looking for in 2023. Data Science Of course, a data scientist should know data science!

Data Science

Data Science Data Scientist Computer Science Computer Science

Why Open Table Format Architecture is Essential for Modern Data Systems

phData

NOVEMBER 8, 2024

Versioning also ensures a safer experimentation environment, where data scientists can test new models or hypotheses on historical data snapshots without impacting live data. Note : Cloud Data warehouses like Snowflake and Big Query already have a default time travel feature. FAQs What is a Data Lakehouse?

Data Lakes

Data Lakes Data Warehouse Database Azure

Best 8 Data Version Control Tools for Machine Learning 2024

DagsHub

DECEMBER 11, 2023

DagsHub DagsHub is a centralized Github-based platform that allows Machine Learning and Data Science teams to build, manage and collaborate on their projects. In addition to versioning code, teams can also version data, models, experiments and more. It versions tables instead of files and has a SQL query interface for those tables.

Machine Learning

Machine Learning Machine Learning Data Lakes Database

Everything is Connected, Everything Changes

Alation

OCTOBER 7, 2021

Jason McVay is a data scientist at Indigo Ag, an agriculture-tech company headquartered in Massachusetts. In this essay, Jason reflects on the value of thinking spatially about data, showing how his experience as a graduate student influences his role as a data scientist today. Spatial isn’t special.

Data Scientist

Data Scientist Data Lakes Data Science SQL

What is Snowpark — and Why Does it Matter? A phData Perspective

phData

SEPTEMBER 20, 2023

Snowpark is the set of libraries and runtimes in Snowflake that securely deploy and process non-SQL code, including Python , Java, and Scala. As a declarative language, SQL is very powerful in allowing users from all backgrounds to ask questions about data. What is Snowflake’s Snowpark? Why Does Snowpark Matter?

SQL

SQL Python Data Lakes Machine Learning

How Northpower used computer vision with AWS to automate safety inspection risk assessments

AWS Machine Learning Blog

SEPTEMBER 27, 2024

Solution overview Amazon SageMaker is a fully managed service that helps developers and data scientists build, train, and deploy machine learning (ML) models. Processing these images and scanned documents is not a cost- or time-efficient task for humans, and requires highly performant infrastructure that can reduce the time to value.

AWS

AWS Data Lakes ML ML

Why companies need to accelerate data warehousing solution modernization

IBM Journey to AI blog

APRIL 24, 2023

A data lakehouse contains an organization’s data in a unstructured, structured, semi-structured form, which can be stored indefinitely for immediate or future use. This data is used by data scientists and engineers who study data to gain business insights.

Data Warehouse

Data Warehouse Data Lakes Database Big Data

Unleashing the power of Presto: The Uber case study

IBM Journey to AI blog

SEPTEMBER 25, 2023

Every day, millions of riders use the Uber app, unwittingly contributing to a complex web of data-driven decisions. This blog takes you on a journey into the world of Uber’s analytics and the critical role that Presto, the open source SQL query engine, plays in driving their success. What is Presto?

Data Lakes

Data Lakes Analytics Analytics Clustering

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Data science vs data analytics: Unpacking the differences

IBM Journey to AI blog

SEPTEMBER 19, 2023

Overview: Data science vs data analytics Think of data science as the overarching umbrella that covers a wide range of tasks performed to find patterns in large datasets, structure data for use, train machine learning models and develop artificial intelligence (AI) applications.

Data Science

Data Science Analytics Analytics Data Scientist

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snorkel AI

MAY 26, 2023

[link] Ahmad Khan, head of artificial intelligence and machine learning strategy at Snowflake gave a presentation entitled “Scalable SQL + Python ML Pipelines in the Cloud” about his company’s Snowpark service at Snorkel AI’s Future of Data-Centric AI virtual conference in August 2022. Welcome everybody.

SQL

SQL ML ML Python

11 Open Source Data Exploration Tools You Need to Know in 2023

ODSC - Open Data Science

FEBRUARY 24, 2023

Its goal is to help with a quick analysis of target characteristics, training vs testing data, and other such data characterization tasks. Apache Superset GitHub | Website Apache Superset is a must-try project for any ML engineer, data scientist, or data analyst.

Exploratory Data Analysis

Exploratory Data Analysis Data Visualization Data Analysis Data Analysis

The Data Scientist’s Guide to the Data Catalog

Alation

JULY 19, 2022

These days, data scientists are in high demand. Across the country, data scientists have an unemployment rate of 2% and command an average salary of nearly $100,000. For these reasons, finding and evaluating data is often time-consuming. How Data Catalogs Help Data Scientists Ask Better Questions.

Data Scientist

Data Scientist Data Quality Data Science Data Analyst

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

Flipboard

AUGUST 17, 2023

Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and ML to deliver the best price-performance at any scale. You can use query_string to filter your dataset by SQL and unload it to Amazon S3.

ML

ML ML AWS Data Warehouse

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Some popular end-to-end MLOps platforms in 2023 Amazon SageMaker Amazon SageMaker provides a unified interface for data preprocessing, model training, and experimentation, allowing data scientists to collaborate and share code easily. Check out the Kubeflow documentation.

Machine Learning

Machine Learning Machine Learning ML ML

Adopting & Scaling AI, a Beginner’s Guide to Prompt Engineering, and Pretraining Large Language…

ODSC - Open Data Science

JULY 27, 2023

Choosing a Data Lake Format: What to Actually Look For The differences between many data lake products today might not matter as much as you think. When choosing a data lake, here’s something else to consider. Data scientists and business leaders should consider these before investing.

Data Lakes

Data Lakes SQL AI AI

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

AWS Machine Learning Blog

NOVEMBER 27, 2023

SageMaker endpoints can be registered to the Salesforce Data Cloud to activate predictions in Salesforce. Salesforce Data Cloud and Einstein Studio Salesforce Data Cloud is a data platform that provides businesses with real-time updates of their customer data from any touch point. For Callback URL , enter [link].studio.sagemaker.aws/canvas/default/lab

ML

ML ML AWS SQL

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

ODSC - Open Data Science

MARCH 30, 2023

5 Reasons Why SQL is Still the Most Accessible Language for New Data Scientists Between its ability to perform data analysis and ease-of-use, here are 5 reasons why SQL is still ideal for new data scientists to get into the field. But there’s a problem: how on earth do you hook up that GPU?

Azure

Azure ML ML Data Modeling

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. As the number of ML-powered apps and services grows, it gets overwhelming for data scientists and ML engineers to build and deploy models at scale.

Machine Learning

Machine Learning Machine Learning Data Scientist ML

How to Version Control Data in ML for Various Data Sources

The MLOps Blog

JANUARY 23, 2023

When it comes to data complexity, it is for sure that in machine learning, we are dealing with much more complex data. First of all, machine learning engineers and data scientists often use data from different data vendors. Some data sets are being corrected by data entry specialists and manual inspectors.

ML

ML ML Data Lakes Machine Learning

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Tableau

FEBRUARY 17, 2021

This offers everyone from data scientists to advanced analysts to business users an intuitive, no-code environment that empowers quick and confident decisions guided by ethical, transparent AI. You can now connect to your data in Azure SQL Database (with Azure Active Directory) and Azure Data Lake Gen 2.

Tableau

Tableau Azure Data Quality ML

Data democratization: How data architecture can drive business decisions and AI initiatives

IBM Journey to AI blog

AUGUST 4, 2023

When done well, data democratization empowers employees with tools that let everyone work with data, not just the data scientists. When workers get their hands on the right data, it not only gives them what they need to solve problems, but also prompts them to ask, “What else can I do with data?

Data Lakes

Data Lakes AI AI Data Governance

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

DataRobot Blog

MARCH 7, 2023

Enterprises that use DataRobot on Azure get the knowledge, experience, and best practices of data scientists from DataRobot. The DataRobot AI Platform seamlessly integrates with Azure cloud services, including Azure Machine Learning, Azure Data Lake Storage Gen 2 (ADLS), Azure Synapse Analytics, and Azure SQL database.

Azure

Azure Machine Learning Machine Learning AI

The Importance of Domain-Specific LLMs, Jobs in Prompt Engineering, and Our Data Primer Series

ODSC - Open Data Science

AUGUST 24, 2023

Start Learning AI With the ODSC West Data Primer Series In this six-part series as part of the ODSC West mini-bootcamp, you’ll learn everything you need to know to get started with AI, including SQL, machine learning, and even LLMs. In addition, we’ll discuss a variety of tools that form the modern LLM application development stack.

Data Lakes

Data Lakes Data Science Machine Learning Machine Learning

Unfolding the Details of Hive in Hadoop

Pickl AI

JULY 6, 2023

Hive is a data warehousing infrastructure built on top of Hadoop. It has the following features: It facilitates querying, summarizing, and analyzing large datasets Hadoop also provides a SQL-like language called HiveQL Hive allows users to write queries to extract valuable insights from structured and semi-structured data stored in Hadoop.

Hadoop

Hadoop SQL Big Data Big Data

How to Use Exploratory Notebooks [Best Practices]

The MLOps Blog

OCTOBER 20, 2023

Jupyter notebooks have been one of the most controversial tools in the data science community. Nevertheless, many data scientists will agree that they can be really valuable – if used well. Data on its own is not sufficient for a cohesive story. in a pandas DataFrame) but in the company’s data warehouse (e.g.,

SQL

SQL Database Data Scientist Python

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Data Engineering is one of the most productive job roles today because it imbibes both the skills required for software engineering and programming and advanced analytics needed by Data Scientists. How to Become an Azure Data Engineer? Data Warehousing concepts and knowledge should be strong. What is Polybase?

Azure

Azure Data Engineering Data Engineering Data Engineer

How to Shift from Data Science to Data Engineering

ODSC - Open Data Science

JANUARY 18, 2024

Data engineering is a rapidly growing field, and there is a high demand for skilled data engineers. If you are a data scientist, you may be wondering if you can transition into data engineering. The good news is that there are many skills that data scientists already have that are transferable to data engineering.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Top Data Analytics Skills and Platforms for 2023

ODSC - Open Data Science

APRIL 3, 2023

This pushes into Big Data as well, as many companies now have significant amounts of data and large data lakes that need analyzing. While there’s a need for analyzing smaller datasets on your laptop, expanding into TB+ datasets requires a whole new set of skills and data analytics frameworks.

Analytics

Analytics Analytics Data Analyst Data Science

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

They are responsible for building and maintaining data architectures, which include databases, data warehouses, and data lakes. Their work ensures that data flows seamlessly through the organisation, making it easier for Data Scientists and Analysts to access and analyse information.

Data Engineering

Data Engineering Data Engineering Data Engineering Data Engineer

Data lakes vs. data warehouses: Decoding the data storage debate

Data Version Control for Data Lakes: Handling the Changes in Large Scale

Webinars

Trending Sources

Simplifying Time Series Analysis for Data Scientists

Webinars

Imperva optimizes SQL generation from natural language using Amazon Bedrock

An integrated experience for all your data and AI with Amazon SageMaker Unified Studio (preview)

Exploring the Power of Microsoft Fabric: A Hands-On Guide with a Sales Use Case

How Rocket Companies modernized their data science solution on AWS

Data Lakes Vs. Data Warehouse: Its significance and relevance in the data world

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Data Science News from Microsoft Ignite 2019

Drowning in Data? A Data Lake May Be Your Lifesaver

Your Complete Roadmap to Become an Azure Data Scientist

40 Must-Know Data Science Skills and Frameworks for 2023

Why Open Table Format Architecture is Essential for Modern Data Systems

Best 8 Data Version Control Tools for Machine Learning 2024

Everything is Connected, Everything Changes

What is Snowpark — and Why Does it Matter? A phData Perspective

How Northpower used computer vision with AWS to automate safety inspection risk assessments

Why companies need to accelerate data warehousing solution modernization

Unleashing the power of Presto: The Uber case study

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Data science vs data analytics: Unpacking the differences

Snowflake Snowpark: cloud SQL and Python ML pipelines

Snowflake Snowpark: cloud SQL and Python ML pipelines

11 Open Source Data Exploration Tools You Need to Know in 2023

The Data Scientist’s Guide to the Data Catalog

Build ML features at scale with Amazon SageMaker Feature Store using data from Amazon Redshift

MLOps Landscape in 2023: Top Tools and Platforms

Adopting & Scaling AI, a Beginner’s Guide to Prompt Engineering, and Pretraining Large Language…

Democratize ML on Salesforce Data Cloud with no-code Amazon SageMaker Canvas

Using Azure ML to Train a Serengeti Data Model, Fast Option Pricing with DL, and How To Connect a…

Definite Guide to Building a Machine Learning Platform

How to Version Control Data in ML for Various Data Sources

Now available in Tableau 2021.1—Einstein Discovery in Tableau, quick LODs, a new unified notification experience, and more

Data democratization: How data architecture can drive business decisions and AI initiatives

Make Better Data-Driven Decisions with DataRobot AI Platform Single-Tenant SaaS on Microsoft Azure

The Importance of Domain-Specific LLMs, Jobs in Prompt Engineering, and Our Data Primer Series

Unfolding the Details of Hive in Hadoop

How to Use Exploratory Notebooks [Best Practices]

Azure Data Engineer Jobs

How to Shift from Data Science to Data Engineering

Top Data Analytics Skills and Platforms for 2023

Discover the Most Important Fundamentals of Data Engineering

Stay Connected