Data Analysis

Libraries for data analyzing.

Newest releases

juliasilge tidytext: Text mining using tidy tools
 

khuyentran1401 Efficient Python Tricks and Tools for Data Scientists
 

TheEconomist The Economist's excess deaths model This repository contains the replication code and data for The Economist's excess deaths model, used to estimate t
 

avito-tech Framework for creating efficient data processing pipelines.
 

coolbutuseless numberwang will convert floating point numbers (and integers) to their word representations, and vice versa.
 

mdecrevoisier Set of Mindmaps providing a detailed overview of the different #Windows auditing capacities and event log files.
 

SciML PreallocationTools.jl is a set of tools for helping build non-allocating pre-cached functions for high-performance computing in Julia.
 

posthog PostHog is an open-source product analytics suite, built for developers. Automate the collection of every event on your website or app, with no need to send data to 3rd parties.
 

mapillary Mapillary Street-level Sequences (MSLS) is a large-scale long-term place recognition dataset that contains 1.6M street-level images.
 

awslabs DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.
 

Amaguk2023 This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data is shown as a Spark Dataframe
 

MoH-Malaysia Official data on the COVID-19 epidemic in Malaysia. Powered by CPRC, CPRC Hospital System, MKAK, and MySejahtera.
 

ryxcommar Implementation of Stata's tabulate command in Pandas for extremely easy to type one-way and two-way tabulations.
 

renatootescu Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.
 

vopani To provide 100 datatable exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts.
 

daleroberts Implementation in R of the Black Scholes formula and some greeks.
 

CITF-Malaysia Official data on Malaysia's National Covid-​19 Immunisation Programme (PICK). Powered by MySejahtera.
 

tuplex Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python inter
 

mrT4ntr4 MODeflattener deobfuscates control flow flattened functions obfuscated by OLLVM using Miasm.
 

pyscaffold PyScaffold extension tailored for Data Science projects. This extension is inspired by cookiecutter-data-science and enhanced in many ways.
 

mlcraft-io low-code business intelligence tool and a data science workflow, open-source Looker alternative
 

escape2020 ESCAPE data science summer school 2021
 

coolbutuseless The goal of the gluestick package is to provide a home for a single, simple function (also named gluestick).
 

delta-io Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use.
 

AlbertoAlmuinha The Tidymodels Extension for Time Series Boosting Models
 

dinghino A Simple modular tool to fetch and parse data related to the stock market.
 

capitalone Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!
 

capitalone Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities and more. Data Profiles can then be used in downstre
 

JohnMcCambridge Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.
 

nv-legate Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD
 

alpha-miner python tools for Finance with the functionality of indicator calculation, business day calculation and so on.
 

rebecca-vickery A comprehensive list of free resources for learning data science
 

datacamp Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f