AWS, Data Modeling and SQL - Data Science Current

How Rocket Companies modernized their data science solution on AWS

AWS Machine Learning Blog

FEBRUARY 21, 2025

Communication between the two systems was established through Kerberized Apache Livy (HTTPS) connections over AWS PrivateLink. Data exploration and model development were conducted using well-known machine learning (ML) tools such as Jupyter or Apache Zeppelin notebooks.

Data Science

Data Science AWS Hadoop Data Scientist

Dynamic text-to-SQL for enterprise workloads with Amazon Bedrock Agents

AWS Machine Learning Blog

APRIL 14, 2025

Text-to-SQL empowers people to explore data and draw insights using natural language, without requiring specialized database knowledge. Amazon Web Services (AWS) has helped many customers connect this text-to-SQL capability with their own data, which means more employees can generate insights.

SQL

SQL Database AWS Data Modeling

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Data Science Dojo

OCTOBER 31, 2024

Their role is crucial in understanding the underlying data structures and how to leverage them for insights. Key Skills Proficiency in SQL is essential, along with experience in data visualization tools such as Tableau or Power BI. Programming Questions Data science roles typically require knowledge of Python, SQL, R, or Hadoop.

Data Science

Data Science Data Scientist Machine Learning Machine Learning

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

AWS Machine Learning Blog

MARCH 18, 2025

SQL is one of the key languages widely used across businesses, and it requires an understanding of databases and table metadata. This can be overwhelming for nontechnical users who lack proficiency in SQL. This application allows users to ask questions in natural language and then generates a SQL query for the users request.

SQL

SQL Database AI AI

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Data Science Blog

SEPTEMBER 19, 2023

So why using IaC for Cloud Data Infrastructures? This ensures that the data models and queries developed by data professionals are consistent with the underlying infrastructure. Enhanced Security and Compliance Data Warehouses often store sensitive information, making security a paramount concern.

Data Warehouse

Data Warehouse Azure SQL Database

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog

NOVEMBER 15, 2023

New big data architectures and, above all, data sharing concepts such as Data Mesh are ideal for creating a common database for many data products and applications. The Event Log Data Model for Process Mining Process Mining as an analytical system can very well be imagined as an iceberg.

Data Modeling

Data Modeling Data Models Business Intelligence Business Intelligence

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

databricks

JUNE 18, 2025

This is an important step forward because it gives LLMs the context they need to take actions in a more natural form.

AI

AI AI Data Science Artificial Intelligence

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

AWS Machine Learning Blog

AUGUST 30, 2024

With the rapid growth of generative artificial intelligence (AI), many AWS customers are looking to take advantage of publicly available foundation models (FMs) and technologies. This includes Meta Llama 3, Meta’s publicly available large language model (LLM).

SQL

SQL AWS Database AI

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Data Science Connect

JANUARY 27, 2023

In this blog post, we will be discussing 7 tips that will help you become a successful data engineer and take your career to the next level. Learn SQL: As a data engineer, you will be working with large amounts of data, and SQL is the most commonly used language for interacting with databases.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

AWS Machine Learning Blog

SEPTEMBER 18, 2024

In addition to its groundbreaking AI innovations, Zeta Global has harnessed Amazon Elastic Container Service (Amazon ECS) with AWS Fargate to deploy a multitude of smaller models efficiently. Additionally, Feast promotes feature reuse, so the time spent on data preparation is reduced greatly.

AWS

AWS Machine Learning Machine Learning ML

TigerEye (YC S22) Is Hiring a Full Stack Engineer

Hacker News

NOVEMBER 19, 2024

Here are a few of the things that you might do as an AI Engineer at TigerEye: - Design, develop, and validate statistical models to explain past behavior and to predict future behavior of our customers’ sales teams - Own training, integration, deployment, versioning, and monitoring of ML components - Improve TigerEye’s existing metrics collection and (..)

Computer Science

Computer Science Computer Science ETL ML

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

AWS Machine Learning Blog

AUGUST 21, 2023

You can streamline the process of feature engineering and data preparation with SageMaker Data Wrangler and finish each stage of the data preparation workflow (including data selection, purification, exploration, visualization, and processing at scale) within a single visual interface.

AWS

AWS Data Lakes Clustering Data Preparation

Essential data engineering tools for 2023: Empowering for management and analysis

Data Science Dojo

JULY 6, 2023

Amazon Redshift: Amazon Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). Amazon Redshift allows data engineers to analyze large datasets quickly using massively parallel processing (MPP) architecture. It is known for its high performance and cost-effectiveness.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Data Scientist Job Description – What Companies Look For in 2025

Pickl AI

JUNE 5, 2025

Model Development and Validation: Building machine learning models tailored to business problems such as customer churn prediction, fraud detection, or demand forecasting. Validation techniques ensure models perform well on unseen data. Data Manipulation: Pandas, NumPy, dplyr. Big Data: Apache Hadoop, Apache Spark.

Data Scientist

Data Scientist Data Science Power BI Machine Learning

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

AWS Machine Learning Blog

MAY 31, 2024

With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the Amazon Web Services (AWS) tools without having to manage infrastructure. The following screenshot shows what the upload looks like when it’s complete.

AWS

AWS Machine Learning Machine Learning Database

Comparing DynamoDB and MongoDB for Big Data Management

Smart Data Collective

OCTOBER 19, 2022

You can only deploy DynamoDB on Amazon Web Services (AWS), and it does not support on-premise deployments. With DynamoDB, you are essentially locked into AWS as your cloud provider. MongoDB is deployable anywhere, and the MongoDB Atlas database-as-a-service can be deployed on AWS, Azure, and Google Cloud Platform (GCP).

Big Data

Big Data Big Data Database AWS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData

SEPTEMBER 19, 2023

However, to fully harness the potential of a data lake, effective data modeling methodologies and processes are crucial. Data modeling plays a pivotal role in defining the structure, relationships, and semantics of data within a data lake. Consistency of data throughout the data lake.

Data Lakes

Data Lakes Data Modeling Data Models Data Warehouse

Automate the deployment of an Amazon Forecast time-series forecasting model

AWS Machine Learning Blog

MAY 4, 2023

Forecast uses ML to learn not only the best algorithm for each item, but also the best ensemble of algorithms for each item, automatically creating the best model for your data. The console and AWS CLI methods are best suited for quick experimentation to check the feasibility of time series forecasting using your data.

AWS

AWS ML ML Data Scientist

Best Data Engineering Tools Every Engineer Should Know

Pickl AI

MARCH 19, 2025

Summary: Data engineering tools streamline data collection, storage, and processing. Tools like Python, SQL, Apache Spark, and Snowflake help engineers automate workflows and improve efficiency. Learning these tools is crucial for building scalable data pipelines.

Data Engineering

Data Engineering Data Engineer Data Engineering Data Engineering

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

Women in Big Data

NOVEMBER 27, 2024

By maintaining historical data from disparate locations, a data warehouse creates a foundation for trend analysis and strategic decision-making. How to Choose a Data Warehouse for Your Big Data Choosing a data warehouse for big data storage necessitates a thorough assessment of your unique requirements.

Data Warehouse

Data Warehouse Big Data Big Data Azure

How to Optimize Power BI and Snowflake for Advanced Analytics

phData

MAY 25, 2023

The June 2021 release of Power BI Desktop introduced Custom SQL queries to Snowflake in DirectQuery mode. However, Snowflake runs better on Azure than it does on AWS – so even though it’s not the ideal situation, Microsoft still sees Azure consumption when organizations host Snowflake on Azure.

Power BI

Power BI Analytics Analytics Azure

Top Ten Power BI Alternatives For Your Data Needs

Pickl AI

NOVEMBER 18, 2024

Advanced tools like AWS QuickSight support large datasets and growing businesses. Microsoft Power BI is a comprehensive business intelligence (BI) tool designed to help organisations turn raw data into meaningful insights. It supports various Visualisations and can connect to various SQL-based data sources.

Power BI

Power BI Tableau Data Analysis Data Analysis

BI Tools Comparison to Improve Data Clarity | Women in Big Data

Women in Big Data

DECEMBER 9, 2024

Lookers strength lies in its ability to connect to a wide variety of data sources. Examples include SQl, DWH, and Cloud based systems (Google Bigquery). With Looker, you can share dashboards and visualizations seamlessly across teams, providing stakeholders with access to real-time data.

Big Data

Big Data Big Data Power BI Tableau

How to choose a graph database: we compare 6 favorites

Cambridge Intelligence

OCTOBER 19, 2023

The answer probably depends more on the complexity of your queries than the connectedness of your data. Relational databases (with recursive SQL queries), document stores, key-value stores, etc., Multi-model databases combine graphs with two other NoSQL data models – document and key-value stores.

Database

Database Azure Analytics Analytics

Azure Data Engineer Jobs

Pickl AI

APRIL 6, 2023

Understand the fundamentals of data engineering: To become an Azure Data Engineer, you must first understand the concepts and principles of data engineering. Knowledge of data modeling, warehousing, integration, pipelines, and transformation is required. Hands-on experience working with SQLDW and SQL-DB.

Azure

Azure Data Engineering Data Engineering Data Engineer

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Pickl AI

JULY 25, 2023

Data engineers are essential professionals responsible for designing, constructing, and maintaining an organization’s data infrastructure. They create data pipelines, ETL processes, and databases to facilitate smooth data flow and storage. Data Visualization: Matplotlib, Seaborn, Tableau, etc.

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Pickl AI

NOVEMBER 4, 2024

Summary: The fundamentals of Data Engineering encompass essential practices like data modelling, warehousing, pipelines, and integration. Understanding these concepts enables professionals to build robust systems that facilitate effective data management and insightful analysis. What is Data Engineering?

Data Engineering

Data Engineering Data Engineering Data Engineer Data Engineering

Generative AI in Software Development

Mlearning.ai

JUNE 16, 2023

Generative AI can be used to automate the data modeling process by generating entity-relationship diagrams or other types of data models and assist in UI design process by generating wireframes or high-fidelity mockups. GPT-4 Data Pipelines: Transform JSON to SQL Schema Instantly Blockstream’s public Bitcoin API.

AI

AI AI Data Analysis Data Analysis

Top ETL Tools: Unveiling the Best Solutions for Data Integration

Pickl AI

JUNE 7, 2024

Summary: Choosing the right ETL tool is crucial for seamless data integration. Top contenders like Apache Airflow and AWS Glue offer unique features, empowering businesses with efficient workflows, high data quality, and informed decision-making capabilities. Scalability: Designed to handle large volumes of data efficiently.

ETL

ETL Data Pipeline Data Quality Data Warehouse

What Are ChatGPT and Its Friends?

Flipboard

MARCH 23, 2023

Estimates start at around $2 million, ranging up to $12 million or so for the newest (and largest) models. Facebook/Meta’s LLaMA, which is smaller than GPT-3 and GPT-4, is thought to have taken roughly one million GPU hours to train, which would cost roughly $2 million on AWS. However, very few companies need to build their own models.

AI

AI AI SQL Natural Language Processing

Comparing Tools For Data Processing Pipelines

The MLOps Blog

MARCH 15, 2023

If you will ask data professionals about what is the most challenging part of their day to day work, you will likely discover their concerns around managing different aspects of data before they get to graduate to the data modeling stage. Uses secure protocols for data security. Cons Limited connectors.

Data Pipeline

Data Pipeline ETL SQL Data Quality

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

For example, if you use AWS, you may prefer Amazon SageMaker as an MLOps platform that integrates with other AWS services. SageMaker Studio offers built-in algorithms, automated model tuning, and seamless integration with AWS services, making it a powerful platform for developing and deploying machine learning solutions at scale.

Machine Learning

Machine Learning Machine Learning ML ML

The Ultimate Modern Data Stack Migration Guide

phData

JULY 18, 2023

Support for Numerous Data Sources: Fivetran supports over 200 data sources, including popular databases, applications, and cloud platforms like Salesforce, Google Analytics, SQL Server, Snowflake, and many more. Additionally, unsupported data sources can be integrated using Fivetran’s cloud function connectors.

Data Warehouse

Data Warehouse Analytics Analytics Cloud Data

How to Manage Unstructured Data in AI and Machine Learning Projects

DagsHub

OCTOBER 23, 2024

Here’s the structured equivalent of this same data in tabular form: With structured data, you can use query languages like SQL to extract and interpret information. In contrast, such traditional query languages struggle to interpret unstructured data. This text has a lot of information, but it is not structured.

Machine Learning

Machine Learning Machine Learning Data Lakes AI

dbt Labs’ Coalesce 2023 Recap

phData

NOVEMBER 13, 2023

It’s about more than just looking at one project; dbt Explorer lets you see the lineage across different projects, ensuring you can track your data’s journey end-to-end without losing track of the details. Figure 3: Multi-project lineage graph with dbt explorer. Source: Dave Connor's Loom.

Database

Database Business Intelligence Business Intelligence Data Silos

How to Choose MLOps Tools: In-Depth Guide for 2024

DagsHub

APRIL 21, 2024

Model Deployment and Serving Platforms Some of the most popular tools for development, serving and scaling are as follows: Amazon SageMaker Developed by Amazon Web Services (AWS) , Amazon Sagemaker is a fully managed machine learning service that allows developers and data scientists to build, train, and deploy machine learning models at scale.

Machine Learning

Machine Learning Machine Learning ML ML

Learnings From Building the ML Platform at Mailchimp

The MLOps Blog

OCTOBER 3, 2023

You see them all the time with a headline like: “data science, machine learning, Java, Python, SQL, or blockchain, computer vision.” It’s almost like a specialized data processing and storage solution. For example, you can use BigQuery , AWS , or Azure. How awful are they?” It’s two things.

ML

ML ML Data Scientist Machine Learning

Data Scientists in the Age of AI Agents and AutoML

Towards AI

JANUARY 22, 2025

Uncomfortable reality: In the era of large language models (LLMs) and AutoML, traditional skills like Python scripting, SQL, and building predictive models are no longer enough for data scientist to remain competitive in the market. Programming expertise: A medium/high proficiency in Python and SQL is enough.

Data Scientist

Data Scientist EDA Exploratory Data Analysis AI

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

AWS Machine Learning Blog

OCTOBER 11, 2024

With the Amazon Bedrock serverless experience, you can experiment with and evaluate top foundation models (FMs) for your use cases, privately customize them with your data using techniques such as fine-tuning and RAG, and build agents that run tasks using enterprise systems and data sources.

Database

Database AWS Clustering Data Lakes

How Rocket Companies modernized their data science solution on AWS

Dynamic text-to-SQL for enterprise workloads with Amazon Bedrock Agents

Trending Sources

Remote Data Science Jobs: 5 High-Demand Roles for Career Growth

Build your gen AI–based text-to-SQL application using RAG, powered by Amazon Bedrock (Claude 3 Sonnet and Amazon Titan for embedding)

Why using Infrastructure as Code for developing Cloud-based Data Warehouse Systems?

Object-centric Process Mining on Data Mesh Architectures

Announcing managed MCP servers with Unity Catalog and Mosaic AI Integration

Best practices for prompt engineering with Meta Llama 3 for Text-to-SQL use cases

Becoming a Data Engineer: 7 Tips to Take Your Career to the Next Level

Building an efficient MLOps platform with OSS tools on Amazon ECS with AWS Fargate

TigerEye (YC S22) Is Hiring a Full Stack Engineer

Apply fine-grained data access controls with AWS Lake Formation in Amazon SageMaker Data Wrangler

Essential data engineering tools for 2023: Empowering for management and analysis

Data Scientist Job Description – What Companies Look For in 2025

Implementing Knowledge Bases for Amazon Bedrock in support of GDPR (right to be forgotten) requests

Comparing DynamoDB and MongoDB for Big Data Management

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Automate the deployment of an Amazon Forecast time-series forecasting model

Best Data Engineering Tools Every Engineer Should Know

Top 5 Data Warehouses to Supercharge Your Big Data Strategy

How to Optimize Power BI and Snowflake for Advanced Analytics

Top Ten Power BI Alternatives For Your Data Needs

BI Tools Comparison to Improve Data Clarity | Women in Big Data

How to choose a graph database: we compare 6 favorites

Azure Data Engineer Jobs

The Data Dilemma: Exploring the Key Differences Between Data Science and Data Engineering

Discover the Most Important Fundamentals of Data Engineering

Generative AI in Software Development

Top ETL Tools: Unveiling the Best Solutions for Data Integration

What Are ChatGPT and Its Friends?

Comparing Tools For Data Processing Pipelines

MLOps Landscape in 2023: Top Tools and Platforms

The Ultimate Modern Data Stack Migration Guide

How to Manage Unstructured Data in AI and Machine Learning Projects

dbt Labs’ Coalesce 2023 Recap

How to Choose MLOps Tools: In-Depth Guide for 2024

Learnings From Building the ML Platform at Mailchimp

Data Scientists in the Age of AI Agents and AutoML

Dive deep into vector data stores using Amazon Bedrock Knowledge Bases

Stay Connected