• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

gregtam/jupyter-notebook-initializer: Default notebook template and auxiliary fi ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

gregtam/jupyter-notebook-initializer

开源软件地址:

https://github.com/gregtam/jupyter-notebook-initializer

开源编程语言:

Jupyter Notebook 98.8%

开源软件介绍:

jupyter-notebook-initializer

The goal of this repository is to serve as a template for any new projects to avoid the hassle of configuring all the options each time a new project is started. The file notebook_initializer.ipynb serves as a starting point for any Jupyter notebook by importing useful libraries and setting up default settings (e.g., plotting settings, SQL magic functions).

Files

Notebooks

  • notebook_initializer.ipynb: This notebook is a template for any new notebook. It should be copied when creating a new notebook so that all of the libraries are already typed in along with any changes to the default matplotlib settings and magic commands to interact with SQL. Alternatively, we can simply 'source' this notebook from another Jupyter notebook by running %run notebook_initializer.ipynb. This helps us keep the main notebook cleaner by having fewer cells.

Python Files

  • credentials.py: This file includes login information into an MPP database. It is important to keep these separate from the notebook so that login information is not present inside of the notebook.

Submodules

  • python-utils: A repository containing utility functions for common machine learning tasks such as plotting, manipulating data, and accessing a SQL database.

  • mpp-plotting: A repository containing functions for plotting using an MPP database. For more info, check out my blog post.

Examples

We now show some examples of how to use these magic functions.

Executing SQL Queries

We can execute SQL commands simply by typing regular SQL code and putting an %%execsql at the very top of the cell.

%%execsql
DROP TABLE IF EXISTS example_data_table;
CREATE TABLE example_data_table
   AS SELECT random()
        FROM generate_series(1, 100);

Reading SQL Query Outputs

We can also read SQL query outputs and store them into a pandas DataFrame by putting %%readsql at the top of the cell. This will output the DataFrame below the cell. It also stores it into a pandas DataFrame called _df by default.

%%readsql
SELECT *
  FROM example_data_table;

If the outputted DataFrame has too many rows, we can also look at the head by typing %%readsql -h # where # is an integer value. This will show the first # rows in the DataFrame, but store the entire DataFrame into _df.

%%readsql -h 10
SELECT *
  FROM example_data_table;

Additionally, if we wanted to store this table to a different DataFrame, we can include this at the end of the top line of the cell.

%%readsql -h 10 other_df
SELECT *
  FROM example_data_table;

One final possibility is to use string formatting. Suppose we have the name of the table as a variable in Python.

table_name = 'example_data_table'

We can then select use this variable in our SQL query string.

%%readsql -h 10 other_df
SELECT *
  FROM {table_name};

Printing SQL Queries

While writing our SQL queries with string formatting, we may want to view the query beforehand as it there may be errors. We want to do this via the %%printsql magic function.

%%printsql
SELECT *
  FROM {table_name};

Autofill Table Names

When tables are created, the refresh_tables function is run. This will place the table name in the namespace as a string. This will give us tab completion for our table names.

Threading/Background Processes

Threading allows us to run two or more cells concurrently. This is useful if we have a long query running, but want to test other stuff inside of the same notebook and use the same variables. We make use of the %%background magic function to execute a cell in the background. The magic function will micro-manage these processes by assigning them with IDs. If IDs are not descriptive enough in identifying what the code is doing, there is also an option to append a comment to the thread.

A simple background thread being called:

%%background
some_code_here

A background thread with a comment:

%%background comment text goes here
some_code_here

We can also separate a cell of code into separate sections, which will individually run in parallel in the background. We do this using lines that begin with two or more '#' characters. Any following characters in the line go towards the thread's comment.

%%background Multi-thread
## First thread
some_code_here
## Second thread
some_other_code_here

The notebook will automatically create a ThreadManager called thread_manager, which is what the %%background magic function accesses. We can view all current and past threads by calling the thread_manager.

thread_manager()

This returns a DataFrame showing all completed and uncompleted threads. It also displays their start and end times, the total time for the code to run, and the comment. If there was an error in the code, then it will display that instead of the end time and completion time.

We can also filter and show only finished or unfinished threads by running:

thread_manager.get_finished_threads()
thread_manager.get_unfinished_threads()

Again, these return DataFrames.




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap