The role of containers on MLOps and model production

Container technology has changed the way data science gets done. The original container use case for data science focused on what I call, “environment management”. Configuring software environments is a constant chore, especially in the open source software space, the space in wh … | Continue reading


@blog.dominodatalab.com | 3 years ago

Snowflake and Domino: Better Together

Introduction Arming data science teams with the access and capabilities needed to establish a two-way flow of information is one critical challenge many organizations face when it comes to unlocking value from their modeling efforts.   Part of this challenge is that many organiza … | Continue reading


@blog.dominodatalab.com | 3 years ago

Faster data exploration in Jupyter through Lux

Notebooks have become one of the key primary tools for many data scientists. They offer a clear way to collaborate with others throughout the process of data exploration, feature engineering and model fitting and through utilizing some clear best practices, can also become living … | Continue reading


@blog.dominodatalab.com | 3 years ago

Analyzing Large P Small N Data

Guest Post by Bill Shannon, Co-Founder and Managing Partner of BioRankings Introduction High throughput screening technologies have been developed to measure all the molecules of interest in a sample in a single experiment (e.g., the entire genome, the amounts of metabolites, the … | Continue reading


@blog.dominodatalab.com | 3 years ago

Performing non-compartmental analysis with Julia and Pumas AI

When analysing pharmacokinetic data to determine the degree of exposure of a drug and associated pharmacokinetic parameters (e.g., clearance, elimination half-life, maximum observed concentration (), time where the maximum concentration was observed (), Non-Compartmental Analysis … | Continue reading


@blog.dominodatalab.com | 3 years ago

Density-Based Clustering

Original content by Manojit Nandi – Updated by Josh Poduska. Cluster Analysis is an important problem in data analysis. Data scientists use clustering to identify malfunctioning servers, group genes with similar expression patterns, and perform various other applications. There a … | Continue reading


@blog.dominodatalab.com | 3 years ago

Bringing ML to agriculture: Transforming a millennia-old industry

Guest post by Jeff Melching from The Climate Corporation At The Climate Corporation, we aim to help farmers better understand their operations and make better decisions to increase their crop yields in a sustainable way. We’ve developed a model-driven software platform, called Cl … | Continue reading


@blog.dominodatalab.com | 3 years ago

Why models fail to deliver value and what you can do about it

Building models requires a lot of time and effort. Data scientists can spend weeks just trying to find, capture and transform data into decent features for models, not to mention many cycles of training, tuning, and tweaking models so they’re performant. Yet despite all this hard … | Continue reading


@blog.dominodatalab.com | 3 years ago

Evaluating Ray: Distributed Python for Scalability

Dean Wampler provides a distilled overview of Ray, an open source system for scaling Python systems from single machines to large clusters. If you are interested in additional insights, register for the upcoming Ray Summit. Introduction This post is for people making technology d … | Continue reading


@blog.dominodatalab.com | 4 years ago

Evaluating Generative Adversarial Networks (GANs)

This article provides concise insights into GANs to help data scientists and researchers assess whether to investigate GANs further. If you are interested in a tutorial as well as hands-on code examples within a Domino project, then consider attending the upcoming webinar, “Gener … | Continue reading


@blog.dominodatalab.com | 4 years ago

Data Drift Detection for Image Classifiers

This article covers how to detect data drift for models that ingest image data as their input in order to prevent their silent degradation in production. Run the example in a complementary Domino project. Introduction: preventing silent model degradation in production In the real … | Continue reading


@blog.dominodatalab.com | 4 years ago

Model Interpretability: The Conversation Continues

This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applicati … | Continue reading


@blog.dominodatalab.com | 4 years ago

Model Interpretability

This Domino Data Science Field Note covers a proposed definition of interpretability and distilled overview of the PDR framework. Insights are drawn from Bin Yu, W. James Murdoch, Chandan Singh, Karl Kumber, and Reza Abbasi-Asi’s recent paper, “Definitions, methods, and applicati … | Continue reading


@blog.dominodatalab.com | 4 years ago

Being Model-Driven: Metrics and Monitoring

This article covers a couple of key Machine Learning (ML) vital signs to consider when tracking ML models in production to ensure model reliability, consistency and performance in the future. Many thanks to Don Miner for collaborating with Domino on this article. For additional v … | Continue reading


@blog.dominodatalab.com | 4 years ago

Collecting, Prepping, Plotting Data: Social-Media Influence in the NBA

This article provides insight on the mindset, approach, and tools to consider when solving a real-world ML problem. It covers questions to consider as well as collecting, prepping and plotting data. A complementary Domino project is available. Introduction Collecting and prepping … | Continue reading


@blog.dominodatalab.com | 4 years ago

Clustering in R

This article covers clustering including K-means and hierarchical clustering. A complementary Domino project is available. Introduction Clustering is a machine learning technique that enables researchers and data scientists to partition and segment data. Segmenting data into appr … | Continue reading


@blog.dominodatalab.com | 4 years ago

Themes and Conferences per Pacoid

Paco Nathan’s latest article covers data practices from the National Oceanic and Atmospheric Administration (NOAA) Environment Data Management (EDM) workshop as well as updates from the AI Conference. Introduction Welcome back to our monthly burst of themespotting and conference … | Continue reading


@blog.dominodatalab.com | 4 years ago

Become a Full Stack Data Science Company (2018)

Hoda Eydgahi is a Data Science Manager at Stitch Fix and a Scout for Sequoia Capital. Previously, Hoda was the first Data Scientist at Color Genomics as well as a Co-founder and CTO of Bluelight Global. An entrepreneur at heart, she is an advisor to and investor in numerous start … | Continue reading


@blog.dominodatalab.com | 4 years ago

Understanding Causal Inference with Python

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available.  Introduction … | Continue reading


@blog.dominodatalab.com | 4 years ago

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available.  Introduction … | Continue reading


@blog.dominodatalab.com | 4 years ago

Understanding Causal Inference

This article covers causal relationships and includes a chapter excerpt from the book Machine Learning in Production: Developing and Optimizing Data Science Workflows and Applications by Andrew Kelleher and Adam Kelleher. A complementary Domino project is available.  Introduction … | Continue reading


@blog.dominodatalab.com | 4 years ago

Time Series with R

This article delves into methods for analyzing multivariate and univariate time series data. A complementary Domino project is available. Introduction Conducting exploratory analysis and extracting meaningful insights from data are core components of research and data science wor … | Continue reading


@blog.dominodatalab.com | 4 years ago

Exploring US Real Estate Values with Python

This post covers data exploration using machine learning and interactive plotting. If interested in running the examples, there is a complementary Domino project available. Introduction Models are at the heart of data science. Data exploration is vital to model development and is … | Continue reading


@blog.dominodatalab.com | 4 years ago

Exploring US Real Estate Values with Python

This post covers data exploration using machine learning and interactive plotting. If interested in running the examples, there is a complementary Domino project available. Introduction Models are at the heart of data science. Data exploration is vital to model development and is … | Continue reading


@blog.dominodatalab.com | 4 years ago

Natural Language in Python Using SpaCy: An Introduction

This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available. Introduction This article and paired Domino project provide a brief introduction to working with natural language (somet … | Continue reading


@blog.dominodatalab.com | 4 years ago

Natural Language in Python Using SpaCy

This article provides a brief introduction to natural language using spaCy and related libraries in Python. The complementary Domino project is also available. Introduction This article and paired Domino project provide a brief introduction to working with natural language (somet … | Continue reading


@blog.dominodatalab.com | 4 years ago

HyperOpt: Bayesian Hyperparameter Optimization

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is a complementary Domino project available. Introduction Feature engineering and hyperparameter optimizat … | Continue reading


@blog.dominodatalab.com | 4 years ago

HyperOpt: Bayesian Hyperparameter Optimization

This article covers how to perform hyperparameter optimization using a sequential model-based optimization (SMBO) technique implemented in the HyperOpt Python package. There is a complementary Domino project available. Introduction Feature engineering and hyperparameter optimizat … | Continue reading


@blog.dominodatalab.com | 4 years ago

Creating interactive crime maps with Folium

You can see this Domino project here. I get very excited about a nice map. But when it comes to creating maps in Python, I have struggled to find the right library in the ever changing jungle of Python libraries. After some research I discovered Folium, which makes it easy to cre … | Continue reading


@blog.dominodatalab.com | 4 years ago

Deep Reinforcement Learning

This article provides an excerpt “Deep Reinforcement Learning” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The article includes an overview of reinforcement learning theory with focus on the deep Q-learning. It also covers using Keras to construct a … | Continue reading


@blog.dominodatalab.com | 4 years ago

Tuning Hyperparameters and Pipelines

This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as build … | Continue reading


@blog.dominodatalab.com | 4 years ago

Towards Predictive Accuracy: Tuning Hyperparameters and Pipelines

This article provides an excerpt of “Tuning Hyperparameters and Pipelines” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. The excerpt and complementary Domino project evaluates hyperparameters including GridSearch and RandomizedSearch as well as build … | Continue reading


@blog.dominodatalab.com | 4 years ago

Deep Learning Illustrated: Building Natural Language Processing Models

Many thanks to Addison-Wesley Professional for providing the permissions to excerpt “Natural Language Processing” from the book, Deep Learning Illustrated by Krohn, Beyleveld, and Bassens. The excerpt covers how to create word vectors and utilize them as an input into a deep lear … | Continue reading


@blog.dominodatalab.com | 4 years ago

Manual Feature Engineering

Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. There is also a complementary Domino project available. Introduction Many data s … | Continue reading


@blog.dominodatalab.com | 4 years ago

Deep Learning on Imagery and Text

Niels Kasch, cofounder of Miner & Kasch, an AI and Data Science consulting firm, provides insight from a deep learning session that occurred at the Maryland Data Science Conference. Introduction This blog recaps Miner & Kasch’s first Maryland Data Science Conference hosted at UMB … | Continue reading


@blog.dominodatalab.com | 4 years ago

Manual Feature Engineering

Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Fenner. There is also a complementary Domino project available. Introduction Many data s … | Continue reading


@blog.dominodatalab.com | 4 years ago

Advanced Jupyter Notebook Tricks (2015)

I love Jupyter notebooks! They’re great for experimenting with new ideas or data sets, and although my notebook “playgrounds” start out as a mess, I use them to crystallize a clear idea for building my final projects. Jupyter is so great for interactive exploratory analysis that … | Continue reading


@blog.dominodatalab.com | 4 years ago

Data Ethics: Contesting Truth and Rearranging Power

This Domino Data Science Field Note covers Chris Wiggins‘s recent data ethics seminar at Berkeley. The article focuses on 1) proposed frameworks for defining and designing for ethics and for understanding the forces that encourage industry to operationalize ethics, as well as 2) … | Continue reading


@blog.dominodatalab.com | 4 years ago

Data Ethics: Contesting Truth and Rearranging Power

This Domino Data Science Field Note covers Chris Wiggins‘s recent data ethics seminar at Berkeley. The article focuses on 1) proposed frameworks for defining and designing for ethics and for understanding the forces that encourage industry to operationalize ethics, as well as 2) … | Continue reading


@blog.dominodatalab.com | 4 years ago

Data Science Past and Future

Paco Nathan presented, “Data Science, Past & Future”, at Rev. This blog post provides a concise session summary, a video, and a written transcript. Session Summary At Rev’s “Data Science, Past & Future”, Paco Nathan covered contextual insight into some common impactful themes ove … | Continue reading


@blog.dominodatalab.com | 4 years ago

Seeking Reproducibility Within Social Science: Search and Discovery

Julia Lane, NYU Professor, Economist and cofounder of the Coleridge Initiative, presented “Where’s the Data: A New Approach to Social Science Search & Discovery” at Rev. Lane described the approach that the Coleridge Initiative is taking to address the science reproducibility cha … | Continue reading


@blog.dominodatalab.com | 4 years ago

A Practitioner’s Guide to Deep Learning with Ludwig

Joshua Poduska provides a distilled overview of Ludwig including when to use Ludwig’s command-line syntax and when to use its Python API. Introduction New tools are constantly being added to the deep learning ecosystem. It can be fun and informative to look for trends in the type … | Continue reading


@blog.dominodatalab.com | 4 years ago

Data Science at The New York Times

Chris Wiggins, Chief Data Scientist at The New York Times, presented “Data Science at the New York Times” at Rev. Wiggins advocated that data scientists find problems that impact the business; re-frame the problem as a machine learning (ML) task; execute on the ML task; and commu … | Continue reading


@blog.dominodatalab.com | 4 years ago

Crash Course on Product Management for AI

Pete Skomoroch presented “Product Management for AI” at Rev. This post provides a distilled summary, video, and full transcript. Session Summary Pete Skomoroch’s “Product Management for AI”session at Rev provided a “crash course” on what product managers and leaders need to know … | Continue reading


@blog.dominodatalab.com | 4 years ago

Product Management for AI

Pete Skomoroch presented “Product Management for AI” at Rev. This post provides a distilled summary, video, and full transcript. Session Summary Pete Skomoroch’s “Product Management for AI”session at Rev provided a “crash course” on what product managers and leaders need to know … | Continue reading


@blog.dominodatalab.com | 4 years ago

Mnist Expanded: 50k New Samples Added

This post provided a distilled overview regarding the rediscovery of 50,000 samples within the MNIST dataset.  MNIST: The Potential Danger of Overfitting Recently, Chhavi Yadav (NYU) and Leon Bottou (Facebook AI Research and NYU) indicated in their paper, “Cold Case: The Lost MNI … | Continue reading


@blog.dominodatalab.com | 4 years ago

Measuring Data Science Business Value (2017)

This blog post covers metrics that help data science leaders ensure their team’s work is aligned to business value. Data science managers and executives, whether coming up through the technical side or the manager side, all struggle with providing visibility for their team and ho … | Continue reading


@blog.dominodatalab.com | 4 years ago

Machine Learning Product Management: Lessons Learned

This Domino Data Science Field Note covers Pete Skomoroch’s recent Strata London talk. It focuses on his ML product management insights and lessons learned. If you are interested in hearing more practical insights on ML or AI product management, then consider attending Pete’s upc … | Continue reading


@blog.dominodatalab.com | 4 years ago