Nugget of the week

tf-idf

a measure of relevance

TF-IDF is a numerical statistic that reflects how important a word is to a document in a collection. The score is a product of term frequency and inverse document frequency.

This measure is popularly used in search engines to rank the retrieved documents.

bloom filter

a probabilistic data structure

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.

It returns either "possibly in set" or "definitely" not in the set. This data structure could be used when we need to find if a username is taken, or an article is read by a user, etc.