PyQuery-based scraping micro-framework.

demiurge PyQuery-based scraping micro-framework. Supports Python 2.x and 3.x. Documentation: Installing demiurge $ pip install demiurge Quick start Define items to be sc
Category: Python / HTML Manipulation
Watchers: 11
Star: 109
Fork: 20
Last update: May 22, 2022

Related Repos

haosulab Learning to manipulate unseen objects from 3D visual inputs is crucial for robots to achieve task automation.

ropensci Convert latex math expressions to HTML for use in markdown documents or package manual pages. The rendering is done in R using the V8 engine, which eliminates the need for embedding the MathJax library in the web pages.

rschroll rmrl is a Python library for rendering reMarkable documents to PDF files. It takes the original PDF document and the files describing your annotations, combining them to produce a document close to what reMarkable itself would output.

ColdHeat pybluemonday is a library for sanitizing HTML very quickly via bluemonday.

felixonmars zhwiki dictionary for fcitx5-pinyin and rime

Knio Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure Python very concisely, which eliminates the need to learn another template lan

MagicStack httptools is a Python binding for the nodejs HTTP parser. The package is available on PyPI: pip install httptools. APIs httptools contains two classes httptools.HttpRequestParser, httptools.HttpResponseParser and a function f

postlight Mercury Parser - Extracting content from chaos The Mercury Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead im

psf Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get: Full JavaScript support! C

rushter selectolax A fast HTML5 parser and CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github: git clone --recursive

matiskay HTML Similarity This package provides a set of functions to measure the similarity between web pages. Install The quick way: pip install html-similarity How it works? Structural Simil

MechanicalSoup Home page Overview A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links

Hrabal TemPy Fast Object-Oriented HTML templating With Python! What? Build HTML without writing a single tag. TemPy dynamically generates HTML and accesses it in a pure Python, or jQuery fashion. Navi

googleprojectzero Domato A DOM fuzzer Written and maintained by Ivan Fratric, [email protected] Copyright 2017 Google Inc. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except

gaojiuli tomd When crawling online articles such as news, blogs, etc. I want to save them in markdown files but not databases. Tomd has the ability of converting a HTML that converted from markdown. If a HTML can't be described by mar