7.1.3.1. Defining precision/recall

Precision
= #(relevant items retrieved) / #(all retrieved items)
= tp / (tp + fp)
= A ∩ B / B

Recall
= #(relevant items retrieved) / #(all relevant items)
= tp / (tp + fn)
= A ∩ B / A

A is set of relevant documents, B is set of retrieved documents

	Relevant	Nonrelevant
Retrieved	True Positive tp	False Positive fp
Not Retrieved	False Negative fn	True Negative tn

Mean Average Precision

7.1.3.2. Harmonic Mean and F Measure

7.1.3.2.1. Pythagorean Mean:

arithmetic mean
$\frac{x_1 + x_2 + \cdots + x_n}{n}$
geometric mean $\sqrt[n]{x_1 \cdot x_2 \cdots x_n}$
harmonic mean $H=\frac{n}{\frac{1}{x_1} + \frac{1}{x_2} + \cdots + \frac{1}{x_n}} = \frac{n}{\sum_{i = 0} ^ n \frac{1}{x_i}}$

7.1.3.2.2. F Measure

an aggregated performance score for the evaluation of algorithms and systems.
The harmonic mean of the precision and the recall.

$F = \frac{2}{\frac{1}{R} + \frac{1}{P}} = \frac{2RP}{R + P}$ $F_\beta = \frac{(\beta^2 + 1)RP}{R + \beta^2P}$

$\beta$ is a parameter that control the relative importance of recall and precision

7.1.3.2.3. Calculating Recall/Precision at Fixed Positions

7.1.3.2.4. Average Precision of the Relevant Documents

7.1.3.2.5. Averaging Across Queries Mean Average Precision(MAP)

Mean Average Precision(MAP) {% math %} MAP = \frac{\sum_{q = 1} ^ Q AveP(q)}{Q} {% endmath %} Q is the number of queries

7.1.3.2.6. Difficulties in Using Precision/Recall

7.1.3.3. Discounted Cumulative Gain

$DCG_p = rel_1 + \sum_{i = 2} ^ p \frac{rel_i}{\log_2(i)}$ $p$ : postion p at a particular rank $rel_i$ : is the graded relevance of the result at position i Typical Discount is $\frac{1}{\log rank}$

The premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized as the graded relevance value is reduced logarithmically proportional to the position of the result.

7.1.3.4. How Evaluation is Done at Web Search Engines

Elements of Good Search Results

7.1.3.5. Google's Search Quality Guidelines

Understanding mobile User

7.1.3.5.1. six rating scale categories

7.1.3.5.2. 4-step process for changing their search algorithm

7.1.3.6. A/B tesing

7.1.3.7. Using user clicks for evaluation

7.1.3.8. Using log files for evaluation

7.1.3.8.1. typical contents of the query log files

7.1.3.9. Google's enhancements for good search results

Search Engine Evaluation