ML Engineer & Data Scientist

José P.
Barrantes

I build production ML systems, data platforms, and fraud/risk solutions that connect modeling to business decisions.

In practice, that means designing ML platforms, data pipelines, and decision-support systems that can be trusted in production. I choose the modeling approach that best fits the problem; whether that involves gradient boosting, Bayesian analysis, causal inference, deep learning, or a clear visualization.

Costa Rica
M.Sc. Computer Science · B.Sc. Biology
EN · ES
View Projects Resume PDF
José P. Barrantes
8+
Years in Data
50+
Projects Shipped
5
Industries
900+
Students Trained
Portfolio

Selected Projects

A collection of work across ML engineering, data science, and analytics.

Filter
MLOps Data Eng ML Eng

Feature Store for Time-Series Forecasting

2025 · Workshop @ DataDay Monterrey

A hands-on workshop demonstrating how to build a feature store for time-series forecasting using Mage for orchestration and Feast as the feature store. Presented at DataDay 2025 in Monterrey, Mexico.

Python Mage Feast Time Series Feature Store
Data Eng Data Science

High-Performance Analytics Pipeline

2025

End-to-end data pipeline on Iowa Liquor Sales (30M+ rows) covering the full data lifecycle: concurrent ingestion from the SODA API, transformation with Polars, persistent storage in DuckDB, geospatial analytics in R, and time-series forecasting with sktime. Accompanied by a hands-on tutorial article.

Data Engineering Mage Polars DuckDB R sktime
ML Eng MLOps API

ML Model API Microservice

2024

Production-ready microservice predicting bike rentals with a CatBoost regressor. Features Pydantic request validation, automated unit/integration/inference tests with pytest, Docker containerization, and a CI/CD pipeline via GitHub Actions. Accompanied by a tutorial article on software engineering practices for data scientists.

FastAPI CatBoost Pydantic Docker pytest GitHub Actions
Data Science Stats

Iowa Liquor Sales Analysis

2024

Deep-dive into 12 years of Iowa liquor sales (~28M invoices). Covers STL decomposition, isolation forest anomaly detection (via PyCaret), and Bayesian estimation (BEST) to answer two questions: does inventory diversity drive sales, and did the pandemic increase alcohol consumption? Built in R with Python interop through reticulate, published as a reproducible Quarto document.

R Time Series Bayesian Inference Anomaly Detection Quarto
ML Eng Data Eng MLOps

Real-Time Fraud Prevention System

2025

End-to-end fraud prevention platform built as five Docker-composed microservices: a FastAPI data generator powered by the Synthetic Data Vault (SDV), Apache Kafka for event streaming, a Mage pipeline that consumes transactions and scores them with an ML model, and a Streamlit risk viewer (backed by DuckDB over Parquet) where analysts review high-risk cases.

Kafka Mage FastAPI Docker Compose Streamlit Fraud Detection
GitHub
Speaking

Talks & Workshops

Sharing knowledge at conferences across Latin America.

Conference Talk

Fraud Prevention, ML & Human-in-the-Loop Design

2026 · Nerdearla · Santiago, Chile 🇨🇱

Talk exploring the intersection of fraud prevention, machine learning, and human-in-the-loop software design patterns, keeping analysts as part of your system for better outcomes.

Fraud Prevention ML ML-enabled Systems Software Architecture Design Patterns
Workshop

Feature Store for Time-Series Forecasting

2025 · DataDay · Monterrey, Mexico 🇲🇽

Hands-on workshop building a feature store for multi-series forecasting on Iowa Liquor Sales data. Covers ETL pipelines in Mage, feature definitions and point-in-time correct training datasets in Feast, DuckDB as the offline store, and online materialization to SQLite for model serving.

MLOps Feature Store Feast Mage DuckDB Time-Series Forecasting
Toolkit

Skills & Technologies

Statistics & Analytics
Inference
Bayesian Data Analysis
Inference
Frequentist Inference
Inference
Causal Inference
Impact
Counterfactual Analysis
Time
Time Series Analysis & Forecasting
Design
Experimental Design & A/B Testing
Viz
Data Visualization
Viz
Data Storytelling
Practice
Uncertainty Quantification
Tools
Apache Superset
Machine Learning
Framework
Scikit-Learn
Framework
PyTorch
Framework
CatBoost
Method
Supervised Learning
Method
Gradient Boosting
Method
Time Series Forecasting
Method
Feature Engineering
Practice
Model Validation
Practice
Explainable / Interpretable ML
Practice
Model Evaluation & Error Analysis
Practice
Human-in-the-Loop ML
MLOps & Software Architecture
Architecture
ML Systems Architecture
Platform
ML Platform Design
Orchestration
Mage
Registry
MLflow
Architecture
Feast / Feature Stores
API
FastAPI
Containers
Docker
Infra
Kubernetes
CI/CD
GitHub Actions
Pattern
Human-in-the-Loop Systems
Linux & Infrastructure
OS
Linux
Shell
Bash / Zsh
VCS
Git
Container
Docker
Orchestration
Kubernetes
Cloud
IBM Cloud
Cloud
AWS
Config
Secrets & Environment Management
Fraud Prevention & Risk Systems
Domain
Fraud Detection & Analytics
Domain
Risk Scoring & Assessment
Pattern
Real-Time Detection Systems
Pattern
Human-in-the-Loop Review
Pattern
Decision Support Systems
Ops
Case Prioritization & Triage
Data Platforms & Data Engineering
SQL
PostgreSQL
OLAP
DuckDB
OLAP
ClickHouse
OLAP
Amazon Athena
Processing
Polars
Processing
Pandas / NumPy
Pipeline
ETL / ELT Pipelines
Pipeline
Feature Engineering Pipelines
Modeling
Data Modeling
Core Languages
Primary
Python
Primary
R / Tidyverse
Primary
SQL (PostgreSQL, ANSI)
Applied Domains
Finance
Fraud Prevention
Finance
Financial Risk
Industry
Fintech
Industry
Payments
Forecasting
Retail & Sales Analytics
Business
Marketing Analytics
Science
Computational Biology
Health
Healthcare & Clinical Data
Get in touch

Let's Connect

Open to collaborations, interesting problems, and good coffee.

Email Me GitHub LinkedIn