pagefish
Search engine with a static webserver
Example usage
- Go to https://peterburk.github.io/pagefish/index.html
- Type a city name.
- Click search.
- See the results.
- (optional) Copy the relation ID to MeetYourMappers
How the hashing algorithm works:
1. Convert every character in the query to a number. (Peter = 80, 101, 116, 101, 114)
2. Add up the total of those numbers (80 + 101 + 116 + 101 + 114 = 512).
3. Get the Unicode character for that number (512 = Ȁ).
4. Load the index file that uses that character as a filename. (GeoNamesIndex/Ȁ.txt)
5. Grep the query again within that file (clashes exist, e.g. "Dover" = 68 + 111 + 118 + 101 + 114 = 512)
Why this is important:
+ The router and filesystem help the search.
+ Characters are just numbers. It's faster to search shorter strings.
+ Server-side operations cost money. Client-side JavaScript is free.
+ The index can be stored on a static file server e.g. Github.
Known limitations:
- Fuzzy matching isn't supported (but this is not a problem for Chinese)
- Some characters aren't allowed in filenames, URLs, etc.
- The hash algorithm should be balanced so that index files are a similar size.
Benchmarks:
time grep -c "Paris" geonames-relations-name-id-lat-lon.txt
0.382s = 382 ms
Typical search time on local: 30 ms.
Typical search time on Github: 200 ms.
This gives a 10x speedup compared to grep.
How to build the index:
cc -lm geonamesToIndex.c -o geonamesToIndex
time cat geonames_relations.txt | ./geonamesToIndex | ./geonamesToIndex
17.185s
Having many small files in the index uses more space. But storage is cheap.
Original file geonames-relations-name-id-lat-lon.txt: 25.8 MB
Index size GeoNamesIndex: 94 MB
About the author
Peter Burkimsher is currently working for OSE, a memory card manufacturer in Kaohsiung, Taiwan.
I did several projects during my 3.5 years here, including writing bit-level drivers to control memory card testing equipment, a Win32 app to parse log files and send them to a SQL server, network monitoring, and Big Data processing.
I studied Electronic Systems Engineering at Lancaster University and graduated with first-class honours in 2011, before going off on Working Holiday visas to see the world.
Now I have 3 years continuous relevant work experience, I’m looking for a suitable job in New Zealand so I can get the Skilled Migrant Category visa.
(Canada’s Express Entry and Australia are also options).