• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

ydataai/pandas-profiling: Create HTML profiling reports from pandas DataFrame ob ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

ydataai/pandas-profiling

开源软件地址:

https://github.com/ydataai/pandas-profiling

开源编程语言:

Python 91.4%

开源软件介绍:

pandas-profiling

Pandas Profiling Logo Header

Build Status Code Coverage Release Version Python Version Code style: black

Documentation | Slack | Stack Overflow | Latest changelog

pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding.

For each column, the following information (whenever relevant for the column type) is presented in an interactive HTML report:

  • Type inference: detect the types of columns in a DataFrame
  • Essentials: type, unique values, indication of missing values
  • Quantile statistics: minimum value, Q1, median, Q3, maximum, range, interquartile range
  • Descriptive statistics: mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
  • Most frequent and extreme values
  • Histograms: categorical and numerical
  • Correlations: high correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik)
  • Missing values: through counts, matrix, heatmap and dendrograms
  • Duplicate rows: list of the most common duplicated rows
  • Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
  • File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existance of EXIF metadata

The report contains three additional sections:

  • Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
  • Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
  • Reproduction: technical details about the analysis (time, version and configuration)

Looking for a Spark backend to profile large datasets? It's work in progress.

Interested in uncovering temporal patterns? Check out popmon.

▶️ Quickstart

Start by loading your pandas DataFrame as you normally would, e.g. by using:

import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])

To generate the standard profiling report, merely run:

profile = ProfileReport(df, title="Pandas Profiling Report")

Using inside Jupyter Notebooks

There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report.

Notebook Widgets

The above is achieved by simply displaying the report as a set of widgets. In a Jupyter Notebook, run:

profile.to_widgets()

The HTML report can be directly embedded in a cell in a similar fashion:

profile.to_notebook_iframe()

HTML

Exporting the report to a file

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, the report's data can be obtained as a JSON file:

# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

Using in the command line

For standard formatted CSV files (which can be read directly by pandas without additional settings), the pandas_profiling executable can be used in the command line. The example below generates a report named Example Profiling Report, using a configuration file called default.yaml, in the file report.html by processing a data.csv dataset.

pandas_profiling --title "Example Profiling Report" --config_file default.yaml data.csv report.html

Additional details on the CLI are available on the documentation.


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
上一篇:
learn-co-students/html5-semantic-containers-code-along-v-000发布时间:2022-06-18
下一篇:
learn-co-students/html5-video-embed-code-along-v-000发布时间:2022-06-18
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap