Remote AWS Certification Exam

In this post, I will talk about my experience with AWS certification for Solution Architect Associate and how I prepared for it. | Continue reading


@luminousmen.com | 3 years ago

Dive into Spark Memory

For Spark, efficient memory usage is critical for good performance and Spark has its own internal model of memory | Continue reading


@luminousmen.com | 3 years ago

Data Management Skills

In data management, we are still in the Wild West - new trends emerge every day. How to stay relevant to the industry? | Continue reading


@luminousmen.com | 3 years ago

Kubernetes 101

As we continue to move our applications from servers and virtual machines to containers, Kubernetes is inevitable. | Continue reading


@luminousmen.com | 3 years ago

M-Motivation

M-Motivation, M-money, M-Mastery, M-Mystery | Continue reading


@luminousmen.com | 3 years ago

AWS Lambda Abuse

We are investigating possible ways to keep our application on AWS Lambda up and running under DDoS attack. | Continue reading


@luminousmen.com | 3 years ago

Thoughts on (Micro)Services

Have you heard of microservices? Of course you have - any housewife already knows how to deploy them on a k8s cluster. Here's some thinking about them. | Continue reading


@luminousmen.com | 3 years ago

Data Lake vs. Data Warehouse

Data Lake and the Data Warehouse. They seemed similar, but there are differences. | Continue reading


@luminousmen.com | 3 years ago

Spark Tips. Partition Tuning

Data partitioning is critical to data processing performance especially for large volumes of data processing in Spark. Here are some partitioning tips | Continue reading


@luminousmen.com | 3 years ago

Python Resource Limitation

Specifying the exact amount of resource in python using built-in modules | Continue reading


@luminousmen.com | 4 years ago

Hive GC overhead limit exceeded

Here are the details of the ORC data format for Hive and why the GC overhead limit exceeded error occurs. | Continue reading


@luminousmen.com | 4 years ago

Asynchronous Programming. Python3.5

In this post, we will figure out how to write concurrent applications on Python3.5 | Continue reading


@luminousmen.com | 4 years ago

Asynchronous Programming. Await the Future

Synchronicity vs Asynchrony, why Asynchrony was created in the first place? What is the event loop and how it all connected with cooperative multitasking? These questions will be explained in this post. | Continue reading


@luminousmen.com | 4 years ago

Exploratory Data Analysis

The data analysis is valuable because it allows you to be more confident that future results will be reliable, correctly interpreted and applied | Continue reading


@luminousmen.com | 4 years ago

Descriptive and Inferential Statistics

Descriptive statistics will teach you the basic concepts used to describe the data sample | Continue reading


@luminousmen.com | 4 years ago

Steps of Scrum

The agile development process is best suited for both the development team and the customer. It is known for its main idea of "check and adapt". | Continue reading


@luminousmen.com | 4 years ago

How to start your blog for 20 cents

For highly loaded systems serverless is a simple way of infinite scaling, and for side projects, it is a great opportunity for free hosting. | Continue reading


@luminousmen.com | 4 years ago

How to estimate time for a project/task accurately

The manager came to you, but you have not estimated the time for your tasks? In this post I will explain the general methods of how to estimate the time for a task or project | Continue reading


@luminousmen.com | 4 years ago

What Are the Best Software Engineering Principles?

Have you ever thought about the basic rules of hygiene and safety in software engineering? | Continue reading


@luminousmen.com | 4 years ago

The Python Style Guidelines

It is way easier to write the code or do a code review by a strictly defined practical style guide for Python, like PEP8 but better | Continue reading


@luminousmen.com | 4 years ago

The 5-minute guide to using bucketing in Pyspark

Guide into bucketing - an optimization technique that uses buckets to determine data partitioning and avoid data shuffle. | Continue reading


@luminousmen.com | 4 years ago

Spark tips. Don't collect data on driver

Apache Spark is the major talking point in Big Data pipelines, boasting performance 10-100x faster than comparable tools. These speeds can be achievable using described tips. | Continue reading


@luminousmen.com | 4 years ago

How to not leap in time using Python

If your application needs to measure elapsed time, you need a timer that will give the right answer even if the user changes the time on the system clock | Continue reading


@luminousmen.com | 4 years ago

Spark Tips. DataFrame API

Apache Spark is the major talking point in Big Data pipelines, boasting performance 10-100x faster than comparable tools. But how achievable are these speeds if you use a slow Python interpreter? | Continue reading


@luminousmen.com | 4 years ago

Schema-on-Read vs. Schema-on-Write

When we talk about working with data, we usually doing it in a system that belongs to one of two types - Schema On Read or Schema On Write. How are they differ? | Continue reading


@luminousmen.com | 4 years ago

Azure Blob Storage with Pyspark

Azure Blob Storage is a Microsoft solution for storing objects in the cloud. It is optimized for storing large amounts of data and can be easily accessed by your Python/spark application | Continue reading


@luminousmen.com | 4 years ago

Demystifying Hypothesis Testing

Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. | Continue reading


@luminousmen.com | 4 years ago

Continuous Integration/Continuous Delivery

Continuous Integration/Continuous Delivery | Continue reading


@luminousmen.com | 4 years ago

Big Data File Formats

The evaluation of the major data formats and storage engines for the Big Data ecosystem has shown the pros and cons of each of them for various metrics, in this post I'll try to compare CSV, JSON, Parquet and Avro formats using Apache Spark. | Continue reading


@luminousmen.com | 4 years ago

Spark. Anatomy of Spark Application

This post closely examines the components of a Spark application, looks at how these components work together and look at how Spark applications run on the YARN cluster. | Continue reading


@luminousmen.com | 4 years ago

Data Science. The Central Limit Theorem and Sampling

Continue reading


@luminousmen.com | 4 years ago

Spark Core Concepts Explained

In this post we will go through Apache Spark core concepts such as RDD and DAG | Continue reading


@luminousmen.com | 4 years ago

Things you need to know about Hadoop and YARN being a Spark developer

Spark has become part of the Hadoop since 2.0 and is one of the most useful technologies for Python Big Data Engineers. Before going in depth of what the Apache Spark consists of, we will briefly understand the Hadoop platform and what YARN is doing there. | Continue reading


@luminousmen.com | 4 years ago

Data Science. Correlation

Understand when to use the Pearson product-moment correlation, what range of values its coefficient can take and how to measure the strength of association. | Continue reading


@luminousmen.com | 4 years ago

Data Science. Measures

In order to asses and describe the distribution of characteristics, we need to know a couple of things: the values of these characteristics, which are typical for the distribution under study and how much they are typical. | Continue reading


@luminousmen.com | 5 years ago

Data Science. Probability Distributions

There are many distributions, but here, we will be talking about the most common and used ones. | Continue reading


@luminousmen.com | 5 years ago

Data Science. Probability

Knowing probability and its applications are important to work effectively on data science problems and this post will remind you what actually is a probability. | Continue reading


@luminousmen.com | 5 years ago

Asynchronous programming. Blocking I/O and non-blocking I/O

In this post, we will be talking about networking blocking and non-blocking I/O in order to explain the concept of asynchronous programming | Continue reading


@luminousmen.com | 5 years ago

Concurrency and parallelism are two different things

It may seem that there is no difference between concurrency and parallelism, but this is because you did not understand the essence of the matter. Let's try to understand how they differ. | Continue reading


@luminousmen.com | 5 years ago

__context__ vs. __cause__ attributes in exception handling

__context__ vs __cause__ attributes in exception handling in Python 3 | Continue reading


@luminousmen.com | 5 years ago

What is the definition of a good software engineer?

What is the definition of a good software engineer? This question's aim is to be personal, it focuses on the thoughts of the people you're asking it. I will show you my thoughts in this post. | Continue reading


@luminousmen.com | 5 years ago

Ode to Unit Tests

Today we will talk about Unit Tests, which are placed at the bottom of the testing pyramid and have the shortest feedback cycle. | Continue reading


@luminousmen.com | 5 years ago

Resolve cython and numpy dependencies on setup step

Describing how to resolve cython and numpy dependencies on setup step. | Continue reading


@luminousmen.com | 5 years ago

How to estimate time for a project/task accurately

In this post I will explain the general methods of how to estimate the time for a task or project and show a step-by-step algorithm for this | Continue reading


@luminousmen.com | 5 years ago

Basic architecture post

A simple post about the distinction between customer problem, created product and architecture solution. | Continue reading


@luminousmen.com | 5 years ago

What Are the Best Software Engineering Principles?

In this post I'll explain what are the best software engineering principles for me? | Continue reading


@luminousmen.com | 5 years ago

Continuous Integration/Continuous Delivery

Continuous Integration/Continuous Delivery | Continue reading


@luminousmen.com | 5 years ago

Python Interview Questions. Part III. Senior

Python interview questions. Part III. Senior | Continue reading


@luminousmen.com | 5 years ago