Building reports from the Augur database schema is an important way for catalogouing questions that our users ask, as well as how we can make the most of the extensive, validated data Augur gathers. Everything is not "dashboard material". Over time, reports people find useful will become APIS in Augur that can be used to construct automated community reports. This is a repository for engaged requirements gathering in the agile spirit.
The template directory is where we begin to understand what the reports "are", as we expect individual users to create their own directories where they make new reports, work with us to create new reports, and then share them with the CHAOSS community. The goal is to then build generalized augur queries/APIs, etc, and make them available as tools.
Install geckodriver for your platform if you want to write annotated PNG files out. This is a great way to automate report generation!
osx: brew install geckodriver
Linux, Windows: Download the latest geckodriver release for your platform from https://github.com/mozilla/geckodriver/releases and follow installation instructions. You can also get source code from that link.
Typically, your fork will take this form: git clone https://github.com/<your-github-username>/augur-community-reports, the only exception being if you create your fork under a GitHub organization. In that case, you would replace your-github-username with the name of the organization where you created the fork.
Create your python virtual environment wherever you routinely store them. We use a virtualenvs directory.
Install the necessary Python libraries for Python 3.9 and later
pip install -r requirements.txt
Change into the directory of your clone
cd augur-community-reports
Run Jupyter Lab
jupyter lab
The information below is for advanced users.
Create a read only user on your augur database, like this:
CREATE USER chaoss WITH PASSWORD 'port88';
GRANT CONNECT ON DATABASE augur TO chaoss;
GRANT USAGE ON SCHEMA augur_data TO chaoss;
GRANT USAGE ON SCHEMA spdx TO chaoss;
GRANT USAGE ON SCHEMA augur_operations TO chaoss;
GRANT SELECT ON ALL TABLES IN SCHEMA augur_data TO chaoss;
GRANT SELECT ON ALL TABLES IN SCHEMA spdx TO chaoss;
GRANT SELECT ON ALL TABLES IN SCHEMA augur_operations TO chaoss;
ALTER DEFAULT PRIVILEGES IN SCHEMA augur_data
GRANT SELECT ON TABLES TO chaoss;
Augur Database Credentials
There are two directories the project starts with:
CHAOSS-Example, which is an example against a publicly available Augur database of the CHAOSS Project's organization on GitHub and
templates, which is a copy of the same notebooks found in CHAOSS-Example that we intend you to make a copy of for your project, which you can do on most linux based systems by running the command cp -R templates my-project-name (consider replacing my-project-name with a meaningful project name).
In your new directory, edit the config.json file in a text editor so that it contains credentials for your Augur database.
In the directory where you want to run Jupyter Lab from, create a file called "config.json":
From your augur-community-reports home directory, with your local python virtual environment activated, and requirements installed, run the command: jupyter lab.
In any NEW jupyter notebooks, place this text as the first cell, if you do not begin by copying one of the existing notebooks:
import psycopg2
import pandas as pd
import sqlalchemy as salc
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import datetime
import json
warnings.filterwarnings('ignore')
with open("config.json") as config_file:
config = json.load(config_file)
database_connection_string = 'postgresql+psycopg2://{}:{}@{}:{}/{}'.format(config['user'], config['password'], config['host'], config['port'], config['database'])
dbschema='augur_data'
engine = salc.create_engine(
database_connection_string,
connect_args={'options': '-csearch_path={}'.format(dbschema)})
Running New Contributor and Pull Request Templates
Using the Control Cell
In both the pull request and contributor templates the control cell is used to configure what is in the report.
Variables in Both Templates
Repo_set: Takes a list of repo_ids you need visualizations for.
Display_grouping: Can be set as 'repo' or 'competitors'. 'repo' groups the visualizations by repo, and 'competitors' groups the visualizations by chart, so data can be easily compared against other repos.
Not_alised_repos: Takes a list of repo_ids you do not want aliased, when display_grouping is set to 'competitors'
Save_files: Can be set to True or False, when True all the visualizations will be export as PNG's
Begin_date and End_date: Take a string in date form, i.e. '2020-03-30'
Variables for New Contributor Template
Group_by: Determines how data is grouped in bar charts. Can be set to 'year', 'quarter', or 'month'
Time and Num_contributions_required: Constraints for a repeat contributor. Indicate the number of contributions a contributor must make in the time to be considered a repeat contributor. Time is in days.
Variables for Pull Request Template
scatter_plot_outliers_removed: Indicates the number of outliers you would like to remove on the days_to_first_response scatter plot. When you have a small number of outliers, this variable is useful for improving the utility of the visualizations.
Notebooks in intial release Authored by Andrew Brain, Sean Goggins, Dawn Foster and Gabe Heim. Don't believe everything you see in a commit history. ;)
Augur is free software: you can redistribute it and/or modify it under the terms of the MIT License as published by the Open Source Initiative. See the LICENSE file for more details.
This work has been funded through the Alfred P. Sloan Foundation, Mozilla, The Reynolds Journalism Institute, and 9 Google Summer of Code Students.
请发表评论