ML Engineer & Data Scientist

José P.
Barrantes

I build production ML systems, data platforms, and fraud prevention solutions that connect modeling to business decisions.

In practice, that means designing ML platforms, data pipelines, and decision-support systems that can be trusted in production. I choose the modeling approach that best fits the problem; whether that involves gradient boosting, Bayesian analysis, causal inference, deep learning, or a clear visualization.

Costa Rica

M.Sc. Computer Science · B.Sc. Biology

EN · ES

View Projects Resume PDF

Years in Data

50+

Projects Shipped

Industries

900+

Students Trained

Portfolio

Selected Projects

A collection of work across ML engineering, data science, and analytics.

Filter

MLOps Data Eng ML Eng

Feature Store for Time-Series Forecasting

2025 · Workshop @ DataDay Monterrey

Companion codebase for the DataDay 2025 workshop in Monterrey, Mexico: a working feature store for time-series forecasting, with Mage handling orchestration and Feast serving point-in-time correct features for training and inference.

Python Mage Feast Time Series Feature Store

Talk Page GitHub

Data Eng Data Science

High-Performance Analytics Pipeline

2025

End-to-end data pipeline on Iowa Liquor Sales (30M+ rows) covering the full data lifecycle: concurrent ingestion from the SODA API, transformation with Polars, persistent storage in DuckDB, geospatial analytics in R, and time-series forecasting with sktime. Accompanied by a hands-on tutorial article.

Data Engineering Mage Polars DuckDB R sktime

Read Article GitHub

ML Eng MLOps API

ML Model API Microservice

2024

Production-ready microservice predicting bike rentals with a CatBoost regressor. Features Pydantic request validation, automated unit/integration/inference tests with pytest, Docker containerization, and a CI/CD pipeline via GitHub Actions. Accompanied by a tutorial article on software engineering practices for data scientists.

FastAPI CatBoost Pydantic Docker pytest GitHub Actions

Read Article GitHub

Data Science Stats

Iowa Liquor Sales Analysis

2024

Deep-dive into 12 years of Iowa liquor sales (~28M invoices). Covers STL decomposition, isolation forest anomaly detection (via PyCaret), and Bayesian estimation (BEST) to answer two questions: does inventory diversity drive sales, and did the pandemic increase alcohol consumption? Built in R with Python interop through reticulate, published as a reproducible Quarto document.

R Time Series Bayesian Inference Anomaly Detection Quarto

Read Article GitHub

ML Eng Data Eng MLOps

Real-Time Fraud Prevention System

2025

End-to-end fraud prevention platform built as five Docker-composed microservices: a FastAPI data generator powered by the Synthetic Data Vault (SDV), Apache Kafka for event streaming, a Mage pipeline that consumes transactions and scores them with an ML model, and a Streamlit risk viewer (backed by DuckDB over Parquet) where analysts review high-risk cases.

Kafka Mage FastAPI Docker Compose Streamlit Fraud Detection

GitHub

Speaking

Talks & Workshops

Sharing knowledge at conferences across Latin America.

Conference Talk

Fraud Prevention, ML & Human-in-the-Loop Design

2026 · Nerdearla · Santiago, Chile 🇨🇱

Talk exploring the intersection of fraud prevention, machine learning, and human-in-the-loop software design patterns. The core idea: systems that keep analysts in the loop produce better outcomes than full automation.

Fraud Prevention ML ML-enabled Systems Software Architecture Design Patterns

Talk Materials Talk Page Nerdflix

Workshop

Feature Store for Time-Series Forecasting

2025 · DataDay · Monterrey, Mexico 🇲🇽

Hands-on workshop building a feature store for multi-series forecasting on Iowa Liquor Sales data. Covers ETL pipelines in Mage, feature definitions and point-in-time correct training datasets in Feast, DuckDB as the offline store, and online materialization to SQLite for model serving.

MLOps Feature Store Feast Mage DuckDB Time-Series Forecasting

Workshop Materials Talk Page

Toolkit

Skills & Technologies

Statistics & Analytics

Inference

Bayesian Data Analysis

Inference

Frequentist Inference

Inference

Causal Inference

Impact

Counterfactual Analysis

Time

Time Series Analysis & Forecasting

Design

Experimental Design & A/B Testing

Viz

Data Visualization

Viz

Data Storytelling

Practice

Uncertainty Quantification

Tools

Apache Superset

Machine Learning

Framework

Scikit-Learn

Framework

PyTorch

Framework

CatBoost

Method

Supervised Learning

Method

Gradient Boosting

Method

Time Series Forecasting

Method

Feature Engineering

Practice

Model Validation

Practice

Explainable / Interpretable ML

Practice

Model Evaluation & Error Analysis

Practice

Human-in-the-Loop ML

MLOps & Software Architecture

Architecture

ML Systems Architecture

Platform

ML Platform Design

Orchestration

Mage

Registry

MLflow

Architecture

Feast / Feature Stores

API

FastAPI

Containers

Docker

Infra

Kubernetes

CI/CD

GitHub Actions

Pattern

Human-in-the-Loop Systems

Linux & Infrastructure

Linux

Shell

Bash / Zsh

VCS

Git

Container

Docker

Orchestration

Kubernetes

Cloud

IBM Cloud

Cloud

AWS

Config

Secrets & Environment Management

Fraud Prevention & Risk Systems

Domain

Fraud Detection & Analytics

Domain

Risk Scoring & Assessment

Pattern

Real-Time Detection Systems

Pattern

Human-in-the-Loop Review

Pattern

Decision Support Systems

Ops

Case Prioritization & Triage

Data Platforms & Data Engineering

SQL

PostgreSQL

OLAP

DuckDB

OLAP

ClickHouse

OLAP

Amazon Athena

Processing

Polars

Processing

Pandas / NumPy

Pipeline

ETL / ELT Pipelines

Pipeline

Feature Engineering Pipelines

Modeling

Data Modeling

Core Languages

Primary

Python

Primary

R / Tidyverse

Primary

SQL (PostgreSQL, ANSI)

Applied Domains

Finance

Fraud Prevention

Finance

Financial Risk

Industry

Fintech

Industry

Payments

Forecasting

Retail & Sales Analytics

Business

Marketing Analytics

Science

Computational Biology

Health

Healthcare & Clinical Data

José P.Barrantes

Selected Projects

Feature Store for Time-Series Forecasting

High-Performance Analytics Pipeline

ML Model API Microservice

Iowa Liquor Sales Analysis

Real-Time Fraud Prevention System

Talks & Workshops

Fraud Prevention, ML & Human-in-the-Loop Design

Feature Store for Time-Series Forecasting

Skills & Technologies

Let's Connect

José P.
Barrantes