Overview

In INFO-I427 Search Informatics, we learn the principles of information retrieval (IR) and put those principles into practice writing our own web search engines and web crawler.

Table of Contents

  1. Introduction to Information Retrieval (IR)
  2. Using the I427 Python environment
  3. Adding a webpage to a corpus for web search
  4. Lists, sets, and tuples In Python
  5. Tokenizing and normalizing a document in a corpus
  6. Writing functions in Python
  7. The document index abstract data type (ADT)
  8. Dictionaries in Python
  9. Stemming and lemmatization
  10. Writing classes in Python
  11. The query engine in Boolean IR
  12. Writing modules in Python
  13. Ranked IR and its query engine
  14. Databases in Python
  15. Term frequency measures
  16. The Vector Space Model
  17. Document scoring based on links between pages
  18. Positional scoring of terms
  19. Scraping a webpage
  20. Crawling the web using Breadth-First Search (BFS)