Data Analysis

Libraries for data analyzing.

Newest releases

ryxcommar Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations.

renatootescu Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

vopani To provide 100 datatable exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts.

daleroberts Implementation in R of the Black Scholes formula and some greeks.

CITF-Malaysia Official data on Malaysia's National Covid-​19 Immunisation Programme (PICK). Powered by MySejahtera.

tuplex Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python inter

mrT4ntr4 MODeflattener deobfuscates control flow flattened functions obfuscated by OLLVM using Miasm.

pyscaffold PyScaffold extension tailored for Data Science projects. This extension is inspired by cookiecutter-data-science and enhanced in many ways.

mlcraft-io low-code business intelligence tool and a data science workflow, open-source Looker alternative

escape2020 ESCAPE data science summer school 2021

coolbutuseless The goal of the gluestick package is to provide a home for a single, simple function (also named gluestick).

delta-io Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use.

AlbertoAlmuinha The Tidymodels Extension for Time Series Boosting Models

dinghino A Simple modular tool to fetch and parse data related to the stock market.

capitalone Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

capitalone Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities and more. Data Profiles can then be used in downstre

JohnMcCambridge Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.

nv-legate Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD

alpha-miner python tools for Finance with the functionality of indicator calculation, business day calculation and so on.

rebecca-vickery A comprehensive list of free resources for learning data science

datacamp Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

cvdfoundation Kinetics is a collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version

slinderman STATS320: Statistical Methods for Neural Data Analysis

CloneTrooper1019 Roblox 0.3.368.0 This is a newly discovered build of Roblox, compiled in March of 2007. To run the client, download this repository as a zip and extra

GUDHI A set of jupyter notebooks for the practice of TDA with the python Gudhi library together with popular machine learning and data sciences libraries.

princefishthrower A tiered chat app based on reddit account age for all wall street bets users.

dataframehq whale is a lightweight data discovery, documentation, and quality engine for your data warehouse.

fivethirtyeight A FiveThirtyEight/The Marshall Project effort to collect comprehensive data on police misconduct settlements from 2010-19.

Androz2091 🌀 What's really in your Discord Data package?

DerwenAI Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

therealsreehari This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginn

Seagate CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

AutoViML Use advanced feature engineering strategies and select the best features from your data set fast with a single line of code.