A smooth details retrieval procedure should have the potential to discover, arrange and current very varied manifestations of knowledge – resembling textual content, images, movies or database files – any of that could be of relevance to the person. although, the idea that of relevance, whereas doubtless intuitive, is absolutely tough to outline, and it really is even tougher to version in a proper way.

Lavrenko doesn't try to bring about a brand new definition of relevance, nor offer arguments as to why any specific definition should be theoretically better or extra whole. as a substitute, he is taking a generally approved, albeit a little conservative definition, makes a number of assumptions, and from them develops a brand new probabilistic version that explicitly captures that concept of relevance. With this ebook, he makes significant contributions to the sector of data retrieval: first, a brand new option to examine topical relevance, complementing the 2 dominant types, i.e., the classical probabilistic version and the language modeling method, and which explicitly combines records, queries, and relevance in one formalism; moment, a brand new strategy for modeling exchangeable sequences of discrete random variables which doesn't make any structural assumptions in regards to the info and which could additionally deal with infrequent events.

Thus his publication is of significant curiosity to researchers and graduate scholars in info retrieval who specialise in relevance modeling, score algorithms, and language modeling.

E. the semantic correspondence between a given request and a given document. We will not be addressing issues of presentation, novelty, or suitability to a particular task. 3 Existing Models of Relevance This book is certainly not the first endeavor to treat relevance in probabilistic terms. Some of the more prominent examples are the 2-Poisson indexing model developed independently by Bookstein and Swanson [15, 16] and Harter [52], the probabilistic retrieval model of Robertson and Sparck Jones [117], the probabilistic flavors [123] of Van Rijsbergen’s logical model [139], the inference-network model developed by Turtle and Croft [135, 134], the language modeling approach pioneered by Ponte and Croft [106, 105] and further developed by others [90, 56], and the recent risk minimization framework of Zhai and Lafferty [68, 157].

However, since 1976 the model has been re-formulated a number of times to a degree where it can hardly be called “binary” and, as we shall argue later on, the term “independence” is also questionable. The model is also known as the Okapi model , the City model, or simply as the probabilistic model . A very detailed account of the recent state of the model is provided by the original authors in [131, 132]. What follows is our interpretation of the model. An attentive reader may find that our description is different in two ways from the way the model is presented by the authors [117, 131, 132].

2. Requests. We are concerned only with observable natural-language requests, for which we can obtain relevance judgments. However, our model will involve a notion similar to the real information need (RIN), which will play the role of a latent variable. We will use the words “query” and “request” interchangeably. 3. Non-interactive. We will not be modeling any evolution of user’s information need. Our model explicitly accounts for the fact that a single information need can be expressed in multiple forms, but we do not view these in the context of an interactive search session.

