Overview
In INFO-I427 Search Informatics, we learn the principles of information retrieval (IR) and put those principles into practice writing our own web search engines and web crawler.
Table of Contents
- Introduction to Information Retrieval (IR)
- Using the I427 Python environment
- Adding a webpage to a corpus for web search
- Lists, sets, and tuples In Python
- Tokenizing and normalizing a document in a corpus
- Writing functions in Python
- The document index abstract data type (ADT)
- Dictionaries in Python
- Stemming and lemmatization
- Writing classes in Python
- The query engine in Boolean IR
- Writing modules in Python
- Ranked IR and its query engine
- Databases in Python
- Term frequency measures
- The Vector Space Model
- Document scoring based on links between pages
- Positional scoring of terms
- Scraping a webpage
- Crawling the web using Breadth-First Search (BFS)