How to Install Presto and Query Distributed Data on Apache Hive and HDFS

Presto is an open source distibruted query engine built for Big Data enabling high performance SQL access to a large variety of data sources including HDFS, PostgreSQL, MySQL, Cassandra, MongoDB, Elasticsearch and Kafka among others. | Continue reading


@janakiev.com | 3 years ago

How to Manage Apache Airflow with Systemd on Debian or Ubuntu

Apache Airflow is a powerfull workflow management system which you can use to automate and manage complex Extract Transform Load (ETL) pipelines. In this tutorial you will see how to integrate Airflow with the systemd system and service manager which is available on most Linux sy … | Continue reading


@janakiev.com | 4 years ago

All the Common Ways to Execute Shell Commands with Python

Python is a wonderful language for scripting and automating workflows and it is packed with useful tools out of the box with the Python Standard Library. A common thing to do, especially for a sysadmin, is to execute shell commands. But what usually will end up in a bash or batch … | Continue reading


@janakiev.com | 4 years ago

Using Virtual Environments in Jupyter Notebook and Python

Are you working with Jupyter Notebook and Python? Do you also want to benefit from virtual environments? In this tutorial you will see how to do just that with Anaconda or Virtualenv/venv. | Continue reading


@janakiev.com | 5 years ago

Analyzing Your File System and Folder Structures with Python

Say you have an external hard drive with layers upon layers of cryptically named folders and intricate mazes of directories (like here, or here). How can you make sense of this mess? Python offers various tools in the Python standard library to deal with your file system and the … | Continue reading


@janakiev.com | 5 years ago

A Couple of Recipes for File and Directory Comparision with Python

The Python standard library offers a powerful set of tools out of the box including file system handling. In this quick little article you’ll see a couple of useful recipes to compare files and directories with the filecmp module. | Continue reading


@janakiev.com | 5 years ago

Predict Economic Indicators with OpenStreetMap

OpenStreetMap (OSM) is a massive collaborative map of the world, built and maintained mostly by volunteers. On the other hand, there exist various indicators to measure economic growth, prosperity, and produce of a country. What if we use OpenStreetMap to predict those economic i … | Continue reading


@janakiev.com | 5 years ago

Querying Wikidata with Python and SPARQL

In this article, we will be going through building queries for Wikidata with Python and SPARQL by taking a look where mayors in Europe are born. This tutorial is building up the knowledge to collect the data responsible for this interactive visualization from the header image whi … | Continue reading


@janakiev.com | 5 years ago

Where Do Mayors Come From? Wikidata Visualization Made with Deckgl

This project is the exploration of mayors in europe with Wikidata and Python. The data is an interactive visualization which is done with the help of deck.gl. Each arc represents the birth place (gray) to the city they are mayor in (color) and the points represent mayors which ar … | Continue reading


@janakiev.com | 5 years ago

Batch Geocoding with Python

You have a list of addresses, but you need to get GPS coordinates to crunch some numbers. Don’t despair, there is geocoding for this and Python provides some simple means to help dealing with the APIs out there. | Continue reading


@janakiev.com | 5 years ago

Compare Countries and Cities with OpenStreetMap and T-SNE

There are many ways to compare countries and cities and many measurements to choose from. We can see how they perform economically, or how their demographics differ, but what if we take a look at data available in OpenStreetMap? In this article we explore just that with the help … | Continue reading


@janakiev.com | 5 years ago

Loading Data from OpenStreetMap with Python and the Overpass API

Have you ever wondered where most Biergarten in Germany are or how many banks are hidden in Switzerland? OpenStreetMap is a great open source map of the world which can give us some insight into these and similar questions. There is a lot of data hidden in this data set, full of … | Continue reading


@janakiev.com | 5 years ago