Image for post
Image for post
Photo by bruce mars on Unsplash

An Overview of the Most Interesting Features of RStudio 1.4

RStudio, the most widely used IDE for R programming, just got better. You should update it now.

This week brought a piece of good news for the R-users — RStudio preview released a major, new version 1.4. Since version 1.2, RStudio is naming its releases after a flower’s name. Version 1.4 is called “Wax Begonia.” It came with some really cool features, in terms of code-readability, document writing, and integration of Python objects.

  • New Users: You can download a free RStudio Desktop version here for Windows, macOS, and several Linux distributions.
  • Existing Users: Launch RStudio, navigate to the “Help” menu…


Image for post
Image for post
Photo by Dan Farrell on Unsplash

A compilation of some must-know statistical concepts

If you are working with data, often you might have come across terms like “tests”, “scores”, “values”, etc. preceded by alphabets like ‘F’, ‘P’, ‘R’, ‘T’, ‘Z’, etc. This article is about a layman-explanation of a few of such statistical terms/concepts, often encountered in the world of Data Science.

This article is not about providing a comprehensive explanation but a concise and high-level compilation of some key statistical measures. Each of the concepts below has several dedicated articles. I will provide links to a few such articles for your reference.

The following topics will be covered:

1) H_0 and H_a — Hypothesis Test

Put simply, a hypothesis…


Image for post
Image for post
Photo by MI PHAM on Unsplash

An Overview of the Most Important Features in Version 0.24

Scikit-learn, Python’s machine-learning library, just got better. It’s the best time to update it.

Recently, in December 2020, scikit-learn released a major update in version 0.24. It is the last stable release of version 0. The next release will supposedly be version 1.0 and is currently in development.

I will provide an overview of some important features introduced in version 0.24. Given a large number of highlight features, I highly recommend upgrading your scikit-learn library.

  • Upgrading to Version 0.24
  • Major Features
  • Other Interesting Features

Upgrading to Version 0.24

I am using Python environments, so, first I activated my desired environment, and then upgraded the…


Image for post
Image for post
Photo by Ben White on Unsplash

Basic must-know SQL for aspiring Data Scientists or Analysts

Every now and then, I see a new Medium article saying “7 must-have skills”, “10 important skills”, “top 3 skills”, etc. for a data scientist. All such posts acknowledge SQL as a must-know skillset for data scientists. Coming from a computational physics background, SQL was foreign to me until I moved to the industry. In academics, you don’t use it much. However, for most companies, the data is often stored in relational databases. SQL is a query language (Structured Query Language) used to communicate (fetch, store, manipulate, etc.) with a database. You can think of SQL as the lingua franca…


Image for post
Image for post
Photo by Mathilde Decourcelle on Unsplash

One of the most important data structures in Python

In this article, I will talk about dictionaries. This is the second article in the series named “Data Structures in Python”. The first part of this series was about lists.

Dictionaries are important data structures in Python that use keys for indexing. They are an unordered sequence of items (key-value pairs), which means the order is not preserved. The keys are immutable. Just like lists, the values of dictionaries can hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, lists, arrays, and even nested dictionaries.

This article will provide you a clear understanding and enable you to work proficiently with


Image for post
Image for post
Photo by Timon Klauser on Unsplash

Get proficient with lists — Python’s most versatile data structure

This article is about lists. They are the most versatile and resourceful, in-built data structure in Python. They can simultaneously hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, functions, etc. within the same list. They are an ordered sequence of items that means the order of the elements is preserved while accessing lists. They are mutable i.e., you can change (add, delete, modify) any item in the list. They can hold duplicate items unlike “sets” — another data structure in Python.

After reading this article, you will gain a clear understanding and ability to work at an advanced level…


Image for post
Image for post
Photo by Pille-Riin Priske on Unsplash

An interesting feature of NumPy to filter unwanted data

All men are sculptors, constantly chipping away the unwanted parts of their lives, trying to create their idea of a masterpiece … Eddie Murphy

If you ever wonder how to filter or handle unwanted, missing, or invalid data in your data science projects or, in general, Python programming, then you must learn the helpful concept of Masking. In this post, I will first guide you through an example for 1-d arrays, followed by 2-d arrays (matrices), and then provide an application of Masking in a Data Science Problem.

1-d Arrays

Suppose we have the following NumPy array:

import numpy as nparr…


Image for post
Image for post
Photo by Pietro Mattia on Unsplash

Learn six unique ways to visualize your big data

Often while working with data, no matter big or small, sometimes you want to compare things side-by-side or plot different attributes or features individually. In such cases, a single figure is rendered insufficient. Thus, you need to know the art of working with subplots.

This article will focus on the concept of subplots. It will teach you six unique ways to create very simple and very complex grids in Python using Matplotlib.

“For every failure, there’s an alternative course of action. You just have to find it. When you come to a roadblock, take a detour” — Mary Kay Ash

Way 1: Using subplots( )


Image for post
Image for post
Photo by Paola Galimberti on Unsplash

Upgrade your Matplotlib today to the latest release 3.3

A few days ago, Matplotlib released version 3.3 — the latest in the third generation of its family. It introduced some really exciting features and I would strongly recommend you to upgrade your Matplotlib today. This post will guide you through them using some examples.

I updated to the latest version in my virtual environment using the following command. For fresh installations, refer to the official installation guidelines.

pip install matplotlib --upgrade

Let’s take a look at the latest highlight features of Matplotlib 3.3.

1) Semantic way to generate complex, subplot grids

A much less verbose way to generate subplots, which allows you to visually layout your axes…


Image for post
Image for post
Photo by Gabriela Gomez on Unsplash

A preview of a few new features of Python 3.10

We are currently living in the stable age of Python 3.8 and the latest stable version of Python, 3.8.4, was released last week. Python 3.9 is already in the beta phase of its development and a beta version (3.9.0b4) was pre-released on July 3, 2020, with the fifth beta pre-release scheduled for tomorrow. The first stable release of 3.9 is expected in October 2020. The development of Python 3.10, too, has kicked-off in May 2020 and the first beta version is expected in May 2021.

For Python lovers, clearly, interesting times lie ahead. Browsing through the release schedules of three…

Ankit Gupta

Data Scientist | Computational Materials Scientist (PhD) | Tech Writer | Stack Overflow Contributor for Python and Matplotlib

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store