This Awesome List aims at providing an overview of open-source projects related to data engineering. This is a community effort: please contribute and send your pull requests for growing this list! For a list including non-OSS too

Data science classes for computer science & and engineering students. Developed a class curriculum, lesson plans, and instructions about how to manage data and create meaningful visualizations using Python, Pandas, Matplotlib, Sea

Write better data pipelines without having to learn a specialized framework. By adopting a convention over configuration philosophy, Ploomber streamlines pipeline execution, allowing teams to confidently develop data products.

ParaMonte is a serial/parallel library of Monte Carlo routines for sampling mathematical objective functions of arbitrary-dimensions, in particular, the posterior distributions of Bayesian models in data science.

Upload a LaTeX error log file, or a LaTeX document, or an R Markdown document to this repository, and I will tell you which LaTeX packages you need to install in your local LaTeX distribution so you can compile your documents to P

This repository contains data on Coronavirus Disease 2019 (COVID-19) in New York City. Data are updated daily, which the exception of all tables in the testing and recent data folders which are updated weekly on Thursday.

QCompute is a Python-based quantum software development kit (SDK). It provides a full-stack programming experience for advanced users via hybrid quantum programming language features and a high-performance simulator.

Start Data Science is a template to help you set up experiments. It brings structure to exploratory data analysis (EDA), through to feature extraction, modeling, and resultant outputs whether they're figures, reports, APIs, or app

Eiten is an open source toolkit that implements various statistical and algorithmic investing strategies such as Eigen Portfolios, Minimum Variance Portfolios, Maximum Sharpe Ratio Portfolios, and Genetic Algorithms based Portfoli

fitter package provides a simple class to identify the distribution from which a data samples is generated from. It uses 80 distributions from Scipy and allows you to plot the results to check what is the most probable distributio

handcalcs is a library to render Python calculation code automatically in Latex, but in a manner that mimics how one might format their calculation if it were written with a pencil: write the symbolic formula, followed by numeric