capitalone Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

capitalone Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities and more. Data Profiles can then be used in downstre

JohnMcCambridge Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.

nv-legate Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD

alpha-miner python tools for Finance with the functionality of indicator calculation, business day calculation and so on.

rebecca-vickery A comprehensive list of free resources for learning data science

datacamp Viewflow Viewflow is a framework built on the top of Airflow that enables data scientists to create materialized views. It allows data scientists to f

cvdfoundation Kinetics is a collection of large-scale, high-quality datasets of URL links of up to 650,000 video clips that cover 400/600/700 human action classes, depending on the dataset version

slinderman STATS320: Statistical Methods for Neural Data Analysis

CloneTrooper1019 Roblox 0.3.368.0 This is a newly discovered build of Roblox, compiled in March of 2007. To run the client, download this repository as a zip and extra

GUDHI A set of jupyter notebooks for the practice of TDA with the python Gudhi library together with popular machine learning and data sciences libraries.

princefishthrower A tiered chat app based on reddit account age for all wall street bets users.

dataframehq whale is a lightweight data discovery, documentation, and quality engine for your data warehouse.

fivethirtyeight A FiveThirtyEight/The Marshall Project effort to collect comprehensive data on police misconduct settlements from 2010-19.

Androz2091 🌀 What's really in your Discord Data package?

DerwenAI Graph-Based Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, RDFlib, pySHACL, RAPIDS, NetworkX, iGraph, PyVis, pslpython, pyarrow, etc.

therealsreehari This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginn

Seagate CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

AutoViML Use advanced feature engineering strategies and select the best features from your data set fast with a single line of code.

aaronwangy A helpful 4-page data science cheatsheet to assist with exam reviews, interview prep, and anything in-between.

therealsreehari This Repository Consists of Free Resources needed for a person to learn Datascience from the beginning to end. This repository is divided into Four main Parts.

liuhuanyong PersonGraphDataSet, nearly 10 thousand person2person relationship facts

einsteinpy EinsteinPy is an open source pure Python package dedicated to problems arising in General Relativity and gravitational physics, such as geodesics plotting for Schwarzschild, Kerr and Kerr Newman space-time model, calculation of Sc

brianhie Language modeling of viral evolution

formlio Use ForML to formally describe a data science problem as a composition of high-level operators. ForML expands your project into a task dependency graph specific to a given life-cycle phase and executes it using any of its supporte

jm199504 小型金融知识图谱构建流程

timkpaine Tributary is a library for constructing dataflow graphs in python. Unlike many other DAG libraries in python (airflow, luigi, prefect, dagster, dask, kedro, etc), tributary is not designed with data/etl pipelines or scheduling in

moodymudskipper Use Dynamic Columns in Data Frames

mathiasbynens Historical data on COVID-19 vaccination doses administered in Germany, per state.

GiorgioComitini Italian Covid-19 vaccination campaign data

italia Open Data su consegna e somministrazione dei vaccini anti COVID-19 in Italia - Commissario straordinario per l'emergenza Covid-19

cgpotts DynaSent is an English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.

hackingthemarkets a collection of open source server components and Python libraries for financial data projects and automated trading