A Python library for creating fast, repeatable and self-documenting data analysis pipelines.

proof is a Python library for creating optimized, repeatable and self-documenting data analysis pipelines. proof was designed to be used with the agate data analysis library, but can be used with numpy, pandas or any other method of proces
Information
Category: Python / Data Analysis
Watchers: 11
Star: 223
Fork: 20
Last update: Oct 15, 2021

Related Repos



juliasilge tidytext: Text mining using tidy tools
 

khuyentran1401 Efficient Python Tricks and Tools for Data Scientists
 

avito-tech Framework for creating efficient data processing pipelines.
 

mdecrevoisier Set of Mindmaps providing a detailed overview of the different #Windows auditing capacities and event log files.
 

coolbutuseless numberwang will convert floating point numbers (and integers) to their word representations, and vice versa.
 

posthog PostHog is an open-source product analytics suite, built for developers. Automate the collection of every event on your website or app, with no need to send data to 3rd parties.
 

mapillary Mapillary Street-level Sequences (MSLS) is a large-scale long-term place recognition dataset that contains 1.6M street-level images.
 

awslabs DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.
 

SciML PreallocationTools.jl is a set of tools for helping build non-allocating pre-cached functions for high-performance computing in Julia.
 

Amaguk2023 This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data is shown as a Spark Dataframe before loading and the whole ETL job is scheduled with crontab.
 

MoH-Malaysia Official data on the COVID-19 epidemic in Malaysia. Powered by CPRC, CPRC Hospital System, MKAK, and MySejahtera.
 

ryxcommar Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations.
 

vopani To provide 100 datatable exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts.
 

daleroberts Implementation in R of the Black Scholes formula and some greeks.
 

CITF-Malaysia Official data on Malaysia's National Covid-​19 Immunisation Programme (PICK). Powered by MySejahtera.