RStudio, the most widely used IDE for R programming, just got better. You should update it now.
This week brought a piece of good news for the R-users — RStudio preview released a major, new version 1.4. Since version 1.2, RStudio is naming its releases after a flower’s name. Version 1.4 is called “Wax Begonia.” It came with some really cool features, in terms of code-readability, document writing, and integration of Python objects.
If you are working with data, often you might have come across terms like “tests”, “scores”, “values”, etc. preceded by alphabets like ‘F’, ‘P’, ‘R’, ‘T’, ‘Z’, etc. This article is about a layman-explanation of a few of such statistical terms/concepts, often encountered in the world of Data Science.
This article is not about providing a comprehensive explanation but a concise and high-level compilation of some key statistical measures. Each of the concepts below has several dedicated articles. I will provide links to a few such articles for your reference.
The following topics will be covered:
Put simply, a hypothesis…
Scikit-learn, Python’s machine-learning library, just got better. It’s the best time to update it.
Recently, in December 2020, scikit-learn released a major update in version 0.24. It is the last stable release of version 0. The next release will supposedly be version 1.0 and is currently in development.
I will provide an overview of some important features introduced in version 0.24. Given a large number of highlight features, I highly recommend upgrading your scikit-learn library.
I am using Python environments, so, first I activated my desired environment, and then upgraded the…
Every now and then, I see a new Medium article saying “7 must-have skills”, “10 important skills”, “top 3 skills”, etc. for a data scientist. All such posts acknowledge SQL as a must-know skillset for data scientists. Coming from a computational physics background, SQL was foreign to me until I moved to the industry. In academics, you don’t use it much. However, for most companies, the data is often stored in relational databases. SQL is a query language (Structured Query Language) used to communicate (fetch, store, manipulate, etc.) with a database. You can think of SQL as the lingua franca…
In this article, I will talk about dictionaries. This is the second article in the series named “Data Structures in Python”. The first part of this series was about lists.
Dictionaries are important data structures in Python that use keys for indexing. They are an unordered sequence of items (key-value pairs), which means the order is not preserved. The keys are immutable. Just like lists, the values of dictionaries can hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, lists, arrays, and even nested dictionaries.
This article will provide you a clear understanding and enable you to work proficiently with…
This article is about lists. They are the most versatile and resourceful, in-built data structure in Python. They can simultaneously hold heterogeneous data i.e., integers, floats, strings, NaN, Booleans, functions, etc. within the same list. They are an ordered sequence of items that means the order of the elements is preserved while accessing lists. They are mutable i.e., you can change (add, delete, modify) any item in the list. They can hold duplicate items unlike “sets” — another data structure in Python.
After reading this article, you will gain a clear understanding and ability to work at an advanced level…
All men are sculptors, constantly chipping away the unwanted parts of their lives, trying to create their idea of a masterpiece … Eddie Murphy
If you ever wonder how to filter or handle unwanted, missing, or invalid data in your data science projects or, in general, Python programming, then you must learn the helpful concept of Masking. In this post, I will first guide you through an example for 1-d arrays, followed by 2-d arrays (matrices), and then provide an application of Masking in a Data Science Problem.
Suppose we have the following NumPy array:
import numpy as nparr…
Often while working with data, no matter big or small, sometimes you want to compare things side-by-side or plot different attributes or features individually. In such cases, a single figure is rendered insufficient. Thus, you need to know the art of working with subplots.
This article will focus on the concept of subplots. It will teach you six unique ways to create very simple and very complex grids in Python using Matplotlib.
“For every failure, there’s an alternative course of action. You just have to find it. When you come to a roadblock, take a detour” — Mary Kay Ash
A few days ago, Matplotlib released version 3.3 — the latest in the third generation of its family. It introduced some really exciting features and I would strongly recommend you to upgrade your Matplotlib today. This post will guide you through them using some examples.
I updated to the latest version in my virtual environment using the following command. For fresh installations, refer to the official installation guidelines.
pip install matplotlib --upgrade
Let’s take a look at the latest highlight features of Matplotlib 3.3.
A much less verbose way to generate subplots, which allows you to visually layout your axes…
We are currently living in the stable age of Python 3.8 and the latest stable version of Python, 3.8.4, was released last week. Python 3.9 is already in the beta phase of its development and a beta version (3.9.0b4) was pre-released on July 3, 2020, with the fifth beta pre-release scheduled for tomorrow. The first stable release of 3.9 is expected in October 2020. The development of Python 3.10, too, has kicked-off in May 2020 and the first beta version is expected in May 2021.
For Python lovers, clearly, interesting times lie ahead. Browsing through the release schedules of three…
Data Scientist | Computational Materials Scientist (PhD) | Tech Writer | Stack Overflow Contributor for Python and Matplotlib