Bibliometrics

Bibliometrics is the study, or measurement, of texts and information. Content analysis is a type of bibliometrics. While it is most often used in the field of library and information science, it has wide applications in other areas.

Historically bibliometric methods have been used to trace relationships amongst academic journal citations. Citation analysis, which involves examining an item's referring documents, is used in searching for materials and analyzing their merit. Citation indexes, such as Institute for Scientific Information's Web of Science, allow users to search forward in time from a known article to more recent publications which cite the known item.

Data from citation indexes can be analyzed to determine the popularity and impact of specific articles, authors, and publications. Using citation analysis to gauge the importance of one's work, for example, is a significant part of the tenure review process. Information scientists also use citation analysis to quantitatively assess the core journal titles and watershed publications in particular disciplines; interrelationships between authors from different institutions and schools of thought; and related data about the sociology of academia.

Although citation analysis is nothing new (the Science Citation Index began publication in 1961), greater computing power is making it more useful and widespread. Google's PageRank is based on the principle of citation analysis.

Other bibliometrics applications include: creating thesauri; measuring term frequencies; exploring grammatical and syntactical structures of texts.

The h-index
The h-index is a number suggested by Jorge E. Hirsch in 2005 for the quantification of scientific output of individual scientific authors. It is based on the citations each article (paper) of an author gets. Hirsch writes:
 * A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np - h) papers have fewer than h citations each.

This number has several advantages over other criteria, e.g., compared against the total number of citations, it is not very sensitive to a single paper that has many citations.

Hirsch suggested the use of data from the Institute for Scientific Information's Web of Knowledge and found, e.g., that Edward Witten with 110 had the highest h-index among physicists. Other high scoring physicists were Marvin L. Cohen and P. W. Anderson.

Google Scholar can also be used as the basis for the h-index computation, but will produce different numbers from classic citation index-based counts, as its citation numbers are sometimes dramatically different. For research fields such as Computer Science, Google Scholar is liable to produce a high h, largely due to traditional citation indexes' poor coverage of high impact conferences and Google Scholar's good coverage of web-based publications. For other fields that publish more in journals and whose scholars are less inclined to put pre-prints on the Web, a Google Scholar-based H is likely to be lower.

References and external links

 * Hirsch, Jorge E., (2005), An index to quantify an individual's scientific research output. Available from arXiv:.
 * Alireza Noruzi, "Scholar is the New Generation of Citation Indexes," Libri, 55(4), 2005 December. (Fulltext requires subscription)
 * Science-Metrix