PostHog is an open-source product analytics suite, built for developers. Automate the collection of every event on your website or app, with no need to send data to 3rd parties.
DenseClus is a Python module for clustering mixed type data using UMAP and HDBSCAN. Allowing for both categorical and numerical data, DenseClus makes it possible to incorporate all features in clustering.
This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data is shown as a Spark Dataframe
To provide 100 datatable exercises over different sections structured as a course or tutorials to teach and learn for beginners, intermediates as well as experts.
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python inter
Delta Sharing is an open protocol for secure real-time exchange of large datasets, which enables organizations to share data in real time regardless of which computing platforms they use.
Loading Data with a single command, the library automatically formats & loads files into a DataFrame. Profiling the Data, the library identifies the schema, statistics, entities and more. Data Profiles can then be used in downstre
Flenser is a simple, minimal, automated exploratory data analysis tool. It runs a set of simple tests against each column within a dataset, and outputs a HTML file noting which tests trigger per column, alongside relevant outputs.
Legate NumPy is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the NumPy API on top of the Legion runtime. Using Legate NumPy you do things like run the final example of the Python CFD