You are currently viewing Important Libraries in Python
Learn about various Python Libraries

Important Libraries in Python

Python library is a collection of functions and methods that allows to perform many actions without writing the code.Python has plethora of libraries but here we will go through most used ones.

Libraries are for data cleaning, data manipulation, visualization, building models and even model deployment (among others). They are very useful resources for a Data Scientist.

Python Libraries for Data Collection

1. Beautiful Soup: Beautiful Soup is an HTML and XML parser which creates parse trees for parsed pages which is used to extract data from webpages.  This process of extracting data from web pages is called web scraping.

2.Scrapy : Scrapy used for extracting the data you require from websites. It is fast and simple to use .

3. Selenium : Selenium is a popular tool for automating browsers. It’s primarily used for testing in the industry but is also very handy for web scraping.

Python Libraries for Data Cleaning and Manipulation

1.Pandas: It is the most popular Python library, period. Pandas is written in the Python language especially for manipulation and analysis tasks. Pandas provide features like: Dataset joining and merging, column deletion and insertion, data filtration, reshaping datasets, manipulate data

2. Numpy : Just like pandas very popular library for numerical operations. It supports large multi-dimensional arrays and matrices. It also brings in high-level mathematical functions to work with these arrays and matrices.

3.PyOD: Used for detecting outliers in the data. Outliers are extremely small or large value significantly different from majority of data.

4. SpaCy. It is a super useful and flexible Natural Language Processing (NLP) library and framework to clean text documents for model creation. SpaCy is fast as compared to other libraries which are used for similar tasks

Python Libraries for Data Visualization

  1. Matplotlib: Matplotlib is the most popular data visualization library in Python. It allows us to generate and build plots of all kinds like histogram, 3-D graphs etc.

2. Seaborn: Seaborn is another plotting library based on matplotlib. It is a python library that provides high level interface for drawing attractive graphs. What matplotlib can do, Seaborn just does it in a more visually appealing manner.

3. Bokeh : Bokeh is an interactive visualization library that targets modern web browsers for presentation. It provides elegant construction of versatile graphics for a large number of datasets.

4. Plotly: Plotly is very powerful data visualization library. It offers animation, charts, animation panes to make plots interactive, one can explain a lot of information, the trends, the rise, fall, movement of the data over the period of time, with the help of Plotly.

5. Altair: Altair is declarative visualization library, it is based on Vega lite grammar. Altair works well in Python and one can plot good meaningful plots to convey insights of the data.

Difference between Altair and Plotly : https://youtu.be/qv_p_tVS2gw

Python Libraries for Modeling

  1. Scikit : Scikit-learn supports different operations that are performed in machine learning like classification, regression, clustering, model selection, etc. You name it – and scikit-learn has a module for that.

2. TensorFlow : Developed by Google, TensorFlow is a popular deep learning library that helps you build and train different models. It is an open source end-to-end platform. TensorFlow provides easy model building, robust machine learning production, and powerful experimentation tools and libraries.

3. PyTorch : A replacement for NumPy to use the power of GPUs , a deep learning research platform that provides maximum flexibility and speed

Python Libraries for Data Interpretability

To understand how the model works , why the model came up with the those results, a Data Scientist should know such answers.

1.LIME : LIME is an algorithm (and library) that can explain the predictions of any classifier or regressor. 

2. H2O : They are market leaders in automated machine learning. H2O’s driverless AI offers simple data visualization techniques for representing high-degree feature interactions and nonlinear model behavior. It provides Machine Learning Interpretability (MLI) through visualizations that clarify modeling results and the effect of features in a model.

For more details and Information watch the video:

Hope the article was helpful. Do comment and like to show your appreciation.

Leave a Reply