Reading

The following is the reading list for my diploma thesis; it is a good overview of my interests at the moment.

[1] Vo Ngoc Anh and Alistair Moffat. Melbourne University 2004: Terabyte and Web tracks. In Voorhees and Buckland [205].
[ bib | www ]
[2] Nick Craswell and David Hawking. Overview of the TREC-2004 Web track. In Voorhees and Buckland [205].
[ bib | www ]
[3] William A. Woods. What's in a link: Foundations for semantic networks. In Bobrow and Collins [199].
[ bib ]
[4] Hugo Zaragoza, Nick Craswell, Michael Taylor, Suchi Saria, and Stephen Robertson. Microsoft Cambridge and TREC-13: Web and HARD tracks. In Voorhees and Buckland [205].
[ bib | www ]
[5] Lada A. Adamic and Eytan Adar. How to search a social network. Social Networks, 27(3):187-203, July 2005.
[ bib | www ]
[6] Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Internet: Diameter of the world-wide web. Nature, 401(6749):130-131, September 1999.
[ bib | www ]
[7] John R. Anderson. The Architecture of Cognition. Cognitive Science Series. Harvard University Press, 1983.
[ bib ]
Keywords: spreading activation
[8] Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.
[ bib ]
Keywords: information retrieval
[9] Albert-László Barabási. Linked. Plume Books, 2003.
[ bib ]
[10] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286:509-512, October 1999.
[ bib | doi ]
[11] Steven M. Beitzel, Ophir Frieder, Eric C. Jensen, David Grossman, Abdur Chowdhury, and Nazli Goharian. Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies. In SAC '03: Proceedings of the 2003 ACM symposium on Applied computing, pages 823-827, New York, NY, USA, 2003. ACM Press.
[ bib | doi ]
Keywords: data fusion
[12] R. K. Belew. Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In SIGIR '89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, pages 11-20, New York, NY, USA, 1989. ACM Press.
[ bib | doi ]
AIR represents a connectionist approach to the task of information retrieval. The system uses relevance feedback from its users to change its representation of authors, index terms and documents so that, over time, AIR improves at its task. The result is a representation of the consensual meaning of keywords and documents shared by some group of users. The central focus goal of this paper is to use our experience with AIR to highlight those characteristics of connectionist representations that make them particularly appropriate for IR applications. We argue that this associative representation is a natural generalization of traditional IR techniques, and that connectionist learning techniques are effective in this setting.
Keywords: spreading activation, information retrieval
[13] N. J. Belkin, P. Kantor, C. Cool, and R. Quatrain. Combining evidence for information retrieval. In D. K. Harman, editor, Proceceedings of the Second Text REtrieval Conference (TREC-2), number 500-215 in NIST Special Publications, pages 35-44. U.S. National Institute of Standards and Technology (NIST), 1994.
[ bib | www ]
Keywords: data fusion
[14] N. J. Belkin, P. Kantor, E. A. Fox, and J. A. Shaw. Combining the evidence of multiple query representations for information retrieval. Information Processing and Management, 31(3):431-448, 1995. Based on [15] and [64].
[ bib | doi ]
Keywords: data fusion
[15] Nicholas J. Belkin, C. Cool, W. Bruce Croft, and James P. Callan. The effect multiple query representations on information retrieval system performance. In SIGIR '93: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 339-346, New York, NY, USA, 1993. ACM Press.
[ bib | doi ]
Keywords: data fusion
[16] Nicholas J. Belkin and W. Bruce Croft. Information filtering and information retrieval: two sides of the same coin? Commun. ACM, 35(12):29-38, 1992.
[ bib | doi ]
Keywords: information filtering, information retrieval
[17] Adam Berger and John Lafferty. Information retrieval as statistical translation. In SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 222-229, New York, NY, USA, 1999. ACM Press.
[ bib | doi ]
[18] Tim Berners-Lee, Robert Cailliau, Ari Luotonen, Henrik Frystyk Nielsen, and Arthur Secret. The World-Wide Web. Communications of the ACM, 37(8):76-82, 1994.
[ bib | doi ]
Keywords: www
[19] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American, 284(5):34-43, May 2001.
[ bib | www ]
A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities
[20] Michael W. Berry, Susan T. Dumais, and Gavin W. O'Brien. Using linear algebra for intelligent information retrieval. Technical Report UT-CS-94-270, University of Tennessee, Knoxville, 1994.
[ bib | www ]
Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents...
Keywords: information retrieval, latent semantic indexing
[21] Krishna Bharat. SearchPad: explicit capture of search context to support Web search. Computer Networks, 33(1-6):493-501, 2000.
[ bib | doi ]
Keywords: information retrieval, personalization
[22] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, 2003.
[ bib | www ]
[23] Ronald J. Brachman and Deborah L. McGuinness. Knowledge representation, connectionism and conceptual retrieval. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 161-174, New York, NY, USA, 1988. ACM Press.
[ bib | doi ]
[24] Ronald J. Brachman and Brian C. Smith. Special issue on knowledge representation. SIGART Bulletin, (70), 1980.
[ bib | doi ]
[25] John S. Breese, David Heckerman, and Carl Kadie. Empirical analysis of predictive algorithms for collaborative filtering. Techical Report MSR-TR-98-12, Microsoft Research, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052, May 1998.
[ bib | www ]
Collaborative ltering or recommender systems use a database about user preferences to predict additional topics or products a new user might like. In this paper we describe several algorithms designed for this task, including techniques based on correlation coe cients, vector-based sim- ilarity calculations, and statistical Bayesian methods. We compare the predictive accuracy of the various methods in a set of representative problem domains. We use two basic classes of evaluation metrics. The rst characterizes accuracy over a set of individual predictions in terms of average absolute deviation. The second estimates the utility of a ranked list of suggested items. This metric uses an estimate of the probability that a user will see a recommendation in an ordered list. Experiments were run for datasets associated with 3 application areas, 4 experimental pro- tocols, and the 2 evaluation metrics for the various algorithms. Results indicate that for a wide range of conditions, Bayesian networks with decision trees at each node and correlation methods outperform Bayesian-clustering and vector-similarity methods. Between correlation and Bayesian networks, the preferred method depends on the nature of the dataset, nature of the application (ranked versus one-by-one presentation), and the availability of votes with which to make predictions. Other considerations include the size of database, speed of predictions, and learning time.
[26] Dan Brickley and Libby Miller. FOAF vocabulary specification. http://xmlns.com/foaf/0.1/, 2005.
[ bib ]
[27] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7):107-117, 1998.
[ bib | www ]
Keywords: information retrieval, web retrieval
[28] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. In Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications networking, pages 309-320, Amsterdam, The Netherlands, The Netherlands, 2000. North-Holland Publishing Co.
[ bib | doi | www ]
[29] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. Computer Networks, 33:309-320, 2000.
[ bib | www ]
[30] Vannevar Bush. As we may think. interactions, 3(2):35-46, 1996. Reprint of [31].
[ bib | doi ]
[31] Vannevar Bush. As we may think. The Atlantic Monthly, 176(1):101-108, July 1945. Reprinted as [30].
[ bib | www ]
[32] Iain Campbell. The ostensive model of developing information-needs. PhD thesis, University of Glasgow, 2000.
[ bib | www ]
From intuitions and informal observations of searching behaviour, a formal model is developed of cognition during a searching session. The model is of the iterative updating of an information-need by exposure of a user to information during a session. The model is path-based £ using trends within the content of objects on a path to predict the current information-need. This provides contextual interpretation of objects based upon the path taken to an object. The model is ostensive in nature; however, instead of the active communicated evidence of traditional conceptions of ostension, it uses passive observational evidence. It produces a new notion of relevance: Ostensive Relevance £ profiles of which are the key to the effective use of path information. The integration of the Ostensive Model and the Binary Probabilistic Model is achieved by weakening of a conventional assumption in the estimation of a probabilistic parameter. This integration effects a novel combination of objective and subjective probabilities £ commonly regarded as incompatible. The Ostensive Model is instantiated in a combination of a networked IR server and a novel graphical user-interface. The interface presents a fish-eyed view of a growing multi-path browsing surface that hides internal representations and obviates querying. The hiding of internals, combined with the ability of the Ostensive Model to follow a developing information-need, makes the interface a truly media-neutral searching environment. A new test collection of general interest images with four binary relevance assessments is constructed and used for an evaluation of three Ostensive Relevance Profiles. The results are analysed in the light of different interpretations of the multiple assessments of the test-collection. The evaluation method is itself analysed and concrete proposals made for its development. The results of the evaluation provide strong encouragement for the Ostensive approach.
Keywords: information retrieval, ostensive retrieval
[33] Iain Campbell and Cornelis J. van Rijsbergen. The ostensive model of developing information needs. In Proceedings of COLIS-96, 2nd International Conference on Conceptions of Library Science, pages 251-268, Kobenhavn, DK, 1996.
[ bib | www ]
We present a model of the progressive development of information needs. It is a model that recognises the changing uncertainty inherent in a user?s cognition of his information need. The approach centres around the collection and combination of ostensive evidence. We present uncertainty profiles associated with the discounting of ostensive evidence with respect to its age, and relate that to a new notion of relevance - Ostensive Relevance. This notion recognises the transient, inaccessible, spatio-temporal nature of relevance. We describe how these components come together to allow the Ostensive Model to be integrated with the traditional Binary Probabilistic Model. We describe the integration and show that it reveals an implicit assumption in the conventional estimation procedure for a particular conditional probability. The temporal aspects of the Ostensive Model allow a weakening of that assumption. The integration allows direct implementation of the Ostensive Model. Finally, we present an example that demonstrates the intuitive appeal of the approach over existing approaches to Relevance Feedback.
Keywords: information retrieval, ostensive retrieval
[34] Maciej Ceglowski, Aaron Coburn, and John Cuadrado. Semantic search of unstructured data using contextual network graphs, 2003.
[ bib | www ]
Keywords: spreading activation, information retrieval
[35] Fan Chung and Linyuan Lu. The average distances in random graphs with given expected degrees. Proceedings of the National Academy of Sciences of the United States of America, 99(25):15879-15882, December 2002.
[ bib | doi | www ]
Random graph theory is used to examine the small-world phenomenon; any two strangers are connected through a short chain of mutual acquaintances. We will show that for certain families of random graphs with given expected degrees the average distance is almost surely of order log nlog d, where d is the weighted average of the sum of squares of the expected degrees. Of particular interest are power law random graphs in which the number of vertices of degree k is proportional to 1kbeta for some fixed exponent beta. For the case of beta > 3, we prove that the average distance of the power law graphs is almost surely of order log nlog d. However, many Internet, social, and citation networks are power law graphs with exponents in the range 2 < beta < 3 for which the power law random graphs have average distance almost surely of order log log n, but have diameter of order log n (provided having some mild constraints for the average distance and maximum degree). In particular, these graphs contain a dense subgraph, which we call the core, having n(clog log n) vertices. Almost all vertices are within distance log log n of the core although there are vertices at distance log n from the core.
[36] Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22-29, 1990.
[ bib | www ]
The term word assaciation is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor.) We wilt extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). This paper will propose a new objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The , proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, waki, g it possible to estimate norms for tens of thousands of words.
Keywords: computational linguistics, statistics
[37] Paul R. Cohen and Rick Kjeldsen. Information retrieval by constrained spreading activation in semantic networks. Inf. Process. Manage., 23(4):255-268, 1987.
[ bib | doi ]
Keywords: information retrieval, spreading activation, GRANT
[38] Nick Craswell, Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Relevance weighting for query independent evidence. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 416-423, New York, NY, USA, 2005. ACM Press.
[ bib | doi | www ]
[39] F. Crestani. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 11(6):453-482, 1997.
[ bib | doi ]
This paper surveys the use of Spreading Activation techniques on Semantic Networks in Associative Information Retrieval. The major Spreading Activation models are presented and their applications to IR is surveyed. A number of works in this area are critically analyzed in order to study the relevance of Spreading Activation for associative IR.
Keywords: spreading activation, information storage and retrieval, semantic networks, associative information retrieval, information processing, knowledge representation
[40] Fabio Crestani and Puay Leng Lee. Searching the web by constrained spreading activation. Information Processing and Management, 36(4):585-605, 2000.
[ bib | doi ]
[41] Fabio Crestani and Puay Leng Lee. WebSCSA: Web search by constrained spreading activation. In ADL '99: Proceedings of the IEEE Forum on Research and Technology Advances in Digital Libraries, page 163, Washington, DC, USA, 1999. IEEE Computer Society.
[ bib ]
Keywords: information retrieval, web retrieval, spreading activation
[42] Fabio Crestani and Cornelis J. van Rijsbergen. A model for adaptive information retrieval. Journal of Intelligent Information Systems, 8:29-56, 1997.
[ bib | doi | www ]
The paper presents a network model that can be used to produce conceptual and logical schemas for Information Retrieval applications. The model has interesting adaptability characteristics and can be instantiated in various effective ways. The paper also reports the results of an experimental investigation into the effectiveness of implementing associative and adaptive retrieval on the proposed model by means of Neural Networks. The implementationmakes use of the learning and generalisation capabilities of the Backpropagation learning algorithm to build up and use application domain knowledge in a sub-symbolic form. The knowledge is acquired from examples of queries and relevant documents. Three different learning strategies are introduced, their performance is analysed and compared with the performance of a traditional Information Retrieval system.
Keywords: information retrieval, adaptive retrieval, model
[43] W. B. Croft, T. J. Lucia, and P. R. Cohen. Retrieving documents by plausible inference: a preliminary study. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 481-494, New York, NY, USA, 1988. ACM Press.
[ bib | doi ]
Keywords: information retrieval, associative retrieval
[44] Francisco Matias Cuenca-Acuna and Thu D. Nguyen. Text-based content search and retrieval in ad hoc p2p communities. In International Workshop on Peer-to-Peer Computing (co-located with Networking 2002), volume 2376 of Lecture Notes in Computer Science. Springer-Verlag, May 2002.
[ bib | www ]
[45] Raymond D'Amore. Expertise community detection. In SIGIR '04: Proceedings of the 27th annual international conference on Research and development in information retrieval, pages 498-499, New York, NY, USA, 2004. ACM Press.
[ bib | doi ]
Keywords: expert finding, expert location
[46] Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science, 41(6):391-407, 1990.
[ bib | www ]
[47] Jens-Peter Dittrich, Marcos Antonio Vaz Salles, Donald Kossmann, and Lukas Blunschi. iMeMex: Escapes from the personal information jungle. In Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005.
[ bib ]
[48] Peter Sheridan Dodds, Roby Muhamad, and Duncan J. Watts. An experimental study of search in global social networks. Science, 301:827-829, August 2003.
[ bib ]
We report on a global social-search experiment in which more than 60,000 e-mail users attempted to reach one of 18 target persons in 13 countries by forwarding messages to acquaintances. We find that successful social search is conducted primarily through intermediate to weak strength ties, does not require highly connected ?hubs? to succeed, and, in contrast to unsuccessful social search, disproportionately relies on professional relationships. By accounting for the attrition of message chains, we estimate that social searches can reach their targets in a median of five to seven steps, depending on the separation of source and target, although small variations in chain lengths and participation rates generate large differences in target reachability. We conclude that although global social networks are, in principle, searchable, actual success depends sensitively on individual incentives.
[49] D. Donato, L. Laura, S. Leonardi, and S. Millozzi. Large scale properties of the webgraph. European Physical Journal B, 38:239-243, 2004.
[ bib | doi ]
[50] Louis G. Doray and Michel Arsenault. Estimators of the regression parameters of the zeta distribution. Insurance: Mathematics and Economics, 30(3):439-450, June 2002.
[ bib | doi ]
The zeta distribution with regression parameters has been rarely used in statistics because of the difficulty of estimating the parameters by traditional maximum likelihood. We propose an alternative method for estimating the parameters based on an iteratively reweighted least-squares algorithm. The quadratic distance estimator (QDE) obtained is consistent, asymptotically unbiased and normally distributed; the estimate can also serve as the initial value required by an algorithm to maximize the likelihood function. We illustrate the method with a numerical example from the insurance literature; we compare the values of the estimates obtained by the quadratic distance and maximum likelihood methods and their approximate variance?covariance matrix. Finally, we calculate the bias, variance and the asymptotic efficiency of the QDE compared to the maximum likelihood estimator (MLE) for some values of the parameters.
Keywords: Zeta distribution; Covariates; Maximum likelihood; Quadratic distance estimator; Iteratively reweighted least-squares; Aymptotic efficiency
[51] Patrick Doreian. A measure of standing for citation networks within a wider environment. Information Processing & Management, 30(1):21-31, 1994.
[ bib ]
[52] Cynthia Dwork, Ravi Kumar, Moni Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW '01: Proceedings of the 10th international conference on World Wide Web, pages 613-622, New York, NY, USA, 2001. ACM Press.
[ bib | doi | www ]
[53] Paul Erdos and Alfréd Rényi. On random graphs. Publicationes Mathematicae, 6:290-297, 1959.
[ bib ]
[54] J. Fagan. Automatic phrase indexing for document retrieval. In SIGIR '87: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval, pages 91-101, New York, NY, USA, 1987. ACM Press.
[ bib | doi ]
[55] Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin, and David P. Williamson. Searching the workplace web. In WWW '03: Proceedings of the 12th international conference on World Wide Web, pages 366-375, New York, NY, USA, 2003. ACM Press.
[ bib | doi ]
[56] Ayman Farahat, Thomas LoFaro, Joel C. Miller, Gergory Rae, and Leslie A. Ward. Authority rankings from HITS, PageRank and SALSA: Existence, uniqueness and effect of initializations. SIAM Journal of Scientific Computing, 27(4):1181-1201, 2005.
[ bib | doi | www ]
Algorithms such as Kleinberg's HITS algorithm, the PageRank algorithm of Brin and Page, and the SALSA algorithm of Lempel and Moran use the link structure of a network of web pages to assign weights to each page in the network. The weights can then be used to rank the pages as authoritative sources. These algorithms share a common underpinning; they find a dominant eigenvector of a nonnegative matrix that describes the link structure of the given network and use the entries of this eigenvector as the page weights. We use this commonality to give a unified treatment, proving the existence of the required eigenvector for the PageRank, HITS, and SALSA algorithms, the uniqueness of the PageRank eigenvector, and the convergence of the algorithms to these eigenvectors. However, we show that the HITS and SALSA eigenvectors need not be unique. We examine how the initialization of the algorithms affects the final weightings produced. We give examples of networks that lead the HITS and SALSA algorithms to return nonunique or nonintuitive rankings. We characterize all such networks in terms of the connectivity of the related HITS authority graph. We propose a modification, Exponentiated Input to HITS, to the adjacency matrix input to the HITS algorithm. We prove that Exponentiated Input to HITS returns a unique ranking, provided that the network is weakly connected. Our examples also show that SALSA can give inconsistent hub and authority weights, due to nonuniqueness. We also mention a small modification to the SALSA initialization which makes the hub and authority weights consistent.
[57] Richard P. Feynman. What do you care what other people think? Bantam, 1989.
[ bib ]
[58] Larry Fitzpatrick and Mei Dent. Automatic feedback using past queries: social searching? In SIGIR '97: Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, pages 306-313, New York, NY, USA, 1997. ACM Press.
[ bib | doi ]
[59] Gary William Flake, Steve Lawrence, and C. Lee Giles. Efficient identification of web communities. In KDD '00: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 150-160. ACM Press, 2000.
[ bib | doi ]
[60] Gary William Flake, Steve Lawrence, C. Lee Giles, and Frans M. Coetzee. Self-organization and identification of web communities. Computer, 35(3):66-71, 2002.
[ bib | doi ]
[61] Gary William Flake, Kostas Tsioutsiouliklis, and Leonid Zhukov. Methods for mining web communities: Bibliometric, spectral, and flow. In Alexandra Poulovassilis and Mark Levene, editors, Web Dynamics, chapter 4, pages 45-68. Springer Verlag, 2004.
[ bib | www ]
In this chapter, we examine the problem of Web community identification expressed in terms of the graph or network structure induced by the Web. While the task of community identification is obviously related to the more fundamental problems of graph partitioning and clustering, the basic task is differentiated from other problems by being within the Web domain. This single difference has many implications for how effective methods work, both in theory and in practice. In order of presentation, we will examine bibliometric similarity measures, bipartite community cores, the HITS algorithm, PageRank, and maximum flow-based Web communities. Interestingly, each of these topics relate to one-another in a non-trivial manner.
[62] Gary William Flake, Kostas Tsioutsiouliklis, and Leonid Zhukov. Methods for mining web communities: Bibliometric, spectral, and flow. Technical Report OR-2003-004, Overture Research, 2003.
[ bib | www ]
In this chapter, we examine the problem of Web community identification expressed in terms of the graph or network structure induced by the Web. While the task of community identification is obviously related to the more fundamental problems of graph partitioning and clustering, the basic task is differentiated from other problems by being within the Web domain. This single difference has many implications for how effective methods work, both in theory and in practice. In order of presentation, we will examine bibliometric similarity measures, bipartite community cores, the HITS algorithm, PageRank, and maximum flow-based Web communities. Interestingly, each of these topics relate to one-another in a non-trivial manner.
[63] Santo Fortunato, Alessandro Flammini, Filippo Menczer, and Alessandro Vespignani. The egalitarian effect of search engines. Technical Report cs.CY/0511005, arXiv.org, 2005. Preprint.
[ bib | www ]
[64] Edward A. Fox and Joseph A. Shaw. Combination of multiple searches. In D. K. Harman, editor, Proceedings of the Second Text REtrieval Conference (TREC-2), number 500-215 in NIST Special Publications. U.S. National Institute of Standards and Technology (NIST), 1994.
[ bib | www ]
Keywords: data fusion
[65] Linton C. Freeman. Centrality in social networks. conceptual clarification. Social Networks, 1:215-239, 1979.
[ bib ]
[66] H. P. Frei and D. Stieger. The use of semantic links in hypertext information retrieval. Information Processing and Management, 31(1):1-13, 1995.
[ bib | doi ]
[67] Jill Freyne and Barry Smyth. An experiment in social search. In Wolfgang Nejdl and Paul De Bra, editors, Adaptive Hypermedia and Adaptive Web-Based Systems, Third International Conference, AH 2004, Eindhoven, The Netherlands, August 23-26, 2004, Proceedings, volume 3137 of Lecture Notes in Computer Science, pages 95-103. Springer, 2004.
[ bib ]
[68] J. Fürnkranz and P. A. Flach. An analysis of rule evaluation metrics. In Proceedings of the 20th International Conference on Machine Learning (ICML'03), pages 202-209. AAAI Press, January 2003.
[ bib | www ]
In this paper we analyze the most popular evaluation metrics for separate-and-conquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straight-forward generalization of the m-estimate trades off these two prototypes.
[69] John S. Garofolo, Ellen M. Voorhees, Vincent M. Stanford, and Karen Spärck Jones. TREC-6 1997 spoken document retrieval track overview and results. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Sixth Text REtrieval Conference TREC-6, number 500-240 in NIST Special Publications. U.S. National Institute of Standards and Technology (NIST), 1997.
[ bib | www ]
[70] Julien Gevrey and Stefan M. Rüger. Link-based approaches for text retrieval. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Tenth Text REtrieval Conference TREC-10, number 500-250 in NIST Special Publications, pages 279-285. U.S. National Institute of Standards and Technology (NIST), 2001.
[ bib | www ]
[71] David Gibson, Jon Kleinberg, and Prabhakar Raghavan. Inferring web communities from link topology. In HYPERTEXT '98: Proceedings of the ninth ACM conference on Hypertext and hypermedia : links, objects, time and space-structure in hypermedia systems, pages 225-234. ACM Press, 1998.
[ bib | doi ]
[72] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proc Natl Acad Sci U S A, 99(12):7821-6, Jun 2002.
[ bib | doi | www ]
A number of recent studies have focused on the statistical properties of networked systems such as social networks and the Worldwide Web. Researchers have concentrated particularly on a few properties that seem to be common to many networks: the small-world property, power-law degree distributions, and network transitivity. In this article, we highlight another property that is found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. We test our method on computer-generated and real-world graphs whose community structure is already known and find that the method detects this known structure with high sensitivity and reliability. We also apply the method to two networks whose community structure is not well known-a collaboration network and a food web-and find that it detects significant and informative community divisions in both cases.
Keywords: Algorithms, Animals, Community Networks, Computer Simulation, Humans, Models, Nerve Net, Neural Networks (Computer), Non-P.H.S., Research Support, Social Behavior, Theoretical, U.S. Gov't, 12060727
[73] Malcolm Gladwell. Six degrees of lois weisberg. The New Yorker, jan 1999.
[ bib | www ]
[74] Natalie S. Glance. Community search assistant. In IUI '01: Proceedings of the 6th international conference on Intelligent user interfaces, pages 91-96, New York, NY, USA, 2001. ACM Press.
[ bib | doi ]
[75] Melanie Gnasa, Sascha Alda, Jasmin Grigull, and Armin B. Cremers. Towards virtual knowledge communities in peer-to-peer networks. In Jamie Callan, Fabio Crestani, and Mark Sanderson, editors, Distributed Multimedia Information Retrieval, volume 2924 of Lecture Notes in Computer Science, pages 143-155. Springer, 2003.
[ bib | www ]
As a result of the anonymity in todays Web search, it is not possible to receive a personalized search result. Neither prior search results nor search results from other users are taken into consideration. In order to resolve this anonymity towards the search engine, a system is created which locally stores the search results in the scope of a peerto- peer network. Using the Peer Search Memory (PeerSy) all approved bookmarks are stored and associated with the corresponding queries. By this means, repeated access is facilitated. Furthermore, sharing of bookmarks in the peer-to-peer network allows grouping of Virtual Knowledge Communities (VKC) in order to obtain a surplus value in knowledge sharing on the Web.
Keywords: ISKODOR
[76] Melanie Gnasa, Sascha Alda, Nadir Gül, Jasmin Grigull, and Armin B. Cremers. Cooperative pull-push cycle for searching a hybrid p2p network. In 4th International Conference on Peer-to-Peer Computing (P2P 2004), pages 192-199, Zürich, Switzerland, 2004.
[ bib | www ]
Information acquisition is a great challenge in the context of a continually growing Web. Nowadays, large Web search engines are primarily designed to assist an information pull by the user. On this platform, only actual information needs are handled without assistance of long-term needs. To overcome these shortcomings we propose a cooperative system for information pull and push on a peerto- peer architecture. In this paper we present a hybrid network for a collaborative search environment, based on a local personalization strategy on each peer, and a highlyavailable Web search service (e.g. Google). Each peer participates in the Pull-Push Cycle, and has the function of an information consumer as well as an information provider. Hence, long-term information needs can be identified without any context restrictions, and recommendations are computed based on Virtual Knowledge Communities.
Keywords: ISKODOR
[77] Melanie Gnasa, Markus Won, and Armin B. Cremers. Three pillars for congenial web search. Continuous evaluation for enhancing web search effectiveness. Journal of Web Engineering, 3(3&4):252-280, 2004.
[ bib | www ]
Keywords: ISKODOR
[78] M. L. Goldstein, S. A. Morris, and G. G. Yen. Problems with fitting to the power-law distribution. The European Physical Journal B, 41(2):255-258, September 2004.
[ bib | www ]
[79] Otis Gospodnetic and Erik Hatcher. Lucene in Action. Manning, 2005.
[ bib | www ]
[80] Mark S. Granovetter. The strength of weak ties. American Journal of Sociology, 78(6):1360-1380, May 1973.
[ bib | www ]
[81] R. Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. Propagation of trust and distrust. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 403-412, New York, NY, USA, 2004. ACM Press.
[ bib | doi ]
[82] Nadir Gül. MyPush - Ein kollaborativer Push Dienst für die automatische Informationsbeschaffung in einem Peer-to-Peer Netzwerk. Diploma thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, March 2004.
[ bib ]
[83] Richard W. Hamming. Coding and information theory. Prentice-Hall, Englewood Cliffs, 1980.
[ bib ]
[84] J. A. Hanley and B. J. McNeil. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1):29-36, April 1982.
[ bib | www ]
A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the rating method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect differences in the accuracy of diagnostic techniques.
Keywords: Evaluation Studies, Humans, Mathematics, Models, Theoretical, Research Support, Non-U.S. Gov't, Statistics, Technology, Radiologic, 7063747
[85] D. Harman. Towards interactive query expansion. In SIGIR '88: Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval, pages 321-331, New York, NY, USA, 1988. ACM Press.
[ bib | doi ]
[86] Donna Harman. Relevance feedback revisited. In SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1-10, New York, NY, USA, 1992. ACM Press.
[ bib | doi ]
[87] Taher Haveliwala. Efficient computation of PageRank. Technical report, Stanford University, October 1999.
[ bib | www ]
This paper discusses e cient techniques for computing PageRank, a ranking met- ric for hypertext documents. We show that PageRank can be computed for very large subgraphs of the web (up to hundreds of millions of nodes) on machines with limited main memory. Running-time measurements on various memory con gurations are presented for PageRank computation over the 24-million-page Stanford WebBase archive. We discuss several methods for analyzing the con- vergence of PageRank based on the induced ordering of the pages. We present convergence results helpful for determining the number of iterations necessary to achieve a useful PageRank assignment, both in the absence and presence of search queries.
Keywords: PageRank, search engine, link structure
[88] Taher H. Haveliwala. Topic-sensitive PageRank. In WWW '02: Proceedings of the eleventh international conference on World Wide Web, pages 517-526. ACM Press, 2002.
[ bib | doi ]
[89] Taher H. Haveliwala, Aristides Gionis, Dan Klein, and Piotr Indyk. Evaluating strategies for similarity search on the web. In WWW '02: Proceedings of the 11th international conference on World Wide Web, pages 432-442, New York, NY, USA, 2002. ACM Press.
[ bib | doi ]
[90] Brian Hayes. A lucid interval. American Scientist, 91(6):484-488, November-December 2003.
[ bib | www ]
[91] Marti A. Hearst and Jan O. Pedersen. Reexamining the cluster hypothesis: scatter/gather on retrieval results. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 76-84, New York, NY, USA, 1996. ACM Press.
[ bib | doi ]
[92] Jonathan L. Herlocker, Joseph A. Konstan, and John Riedl. Explaining collaborative filtering recommendations. In CSCW '00: Proceedings of the 2000 ACM conference on Computer supported cooperative work, pages 241-250, New York, NY, USA, 2000. ACM Press.
[ bib | doi ]
[93] William C. Hill, James D. Hollan, Dave Wroblewski, and Tim McCandless. Edit wear and read wear. In CHI '92: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 3-9, New York, NY, USA, 1992. ACM Press.
[ bib | doi ]
[94] Paul W. Holland and Samuel Leinhardt. A method for detecting structure in sociometric data. American Journal of Sociology, 76(3):492-513, November 1970.
[ bib ]
[95] Bernardo A. Huberman and Lada A. Adamic. Information dynamics in the networked world. In Eli Ben-Naim, Hans Frauenfelder, and Zoltan Toroczkai, editors, Complex Networks, volume 650 of Lecture Notes in Physics, pages 371-398. Springer, 2004.
[ bib | www ]
[96] Bernardo A. Huberman, Peter L. T. Pirolli, James E. Pitkow, and Rajan M. Lukose. Strong regularities in world wide web surfing. Science, 280(5360):95-97, April 1998.
[ bib | doi | www ]
One of the most common modes of accessing information in the World Wide Web is surfing from one document to another along hyperlinks. Several large empirical studies have revealed common patterns of surfing behavior. A model that assumes that users make a sequence of decisions to proceed to another page, continuing as long as the value of the current page exceeds some threshold, yields the probability distribution for the number of pages that a user visits within a given Web site. This model was verified by comparing its predictions with detailed measurements of surfing patterns. The model also explains the observed Zipf-like distributions in page hits observed at Web sites.
[97] Svante Janson, Donald E. Knuth, Tomasz uczak, and Boris Pittel. The birth of the giant component. Random Structures & Algorithms, 4(3):233-358, 1993.
[ bib | www ]
[98] K. Spärck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval, part 1. Information Processing and Management, 36:779-808, 2000.
[ bib | www ]
[99] K. Spärck Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval, part 2. Information Processing and Management, 36:809-840, 2000.
[ bib | www ]
[100] Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. The eigentrust algorithm for reputation management in p2p networks. In WWW '03: Proceedings of the 12th international conference on World Wide Web, pages 640-651, New York, NY, USA, 2003. ACM Press.
[ bib | doi | www ]
[101] Paul B. Kantor and Ellen M. Voorhees. Report on the TREC-5 confusion track. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Fifth Text REtrieval Conference TREC-5, number 500-238 in NIST Special Publications. U.S. National Institute of Standards and Technology (NIST), 1996.
[ bib | www ]
[102] Henry Kautz, Bart Selman, and Mehul Shah. Referral web: combining social networks and collaborative filtering. Commununications of the ACM, 40(3):63-65, 1997.
[ bib | doi ]
Keywords: expert finding, expert location, world wide web, referrals
[103] Henry Kautz, Bart Selman, and Mehul Shah. The hidden web. AI Magazine, 18(2):27-36, 1997.
[ bib | www ]
Keywords: expert finding, expert location, world wide web, referrals
[104] Jon Kleinberg. The small-world phenomenon: an algorithm perspective. In STOC '00: Proceedings of the thirty-second annual ACM symposium on Theory of computing, pages 163-170, New York, NY, USA, 2000. ACM Press.
[ bib | doi ]
[105] Jon Kleinberg and Andrew Tomkins. Applications of linear algebra in information retrieval and hypertext analysis. In PODS '99: Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 185-193, New York, NY, USA, 1999. ACM Press.
[ bib | doi ]
[106] Jon M. Kleinberg. Navigation in a small world. Nature, 406:845, August 2000.
[ bib | doi | www ]
[107] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604-632, 1999.
[ bib | doi ]
[108] Donald E. Knuth. Computer programming as an art. Commun. ACM, 17(12):667-673, 1974.
[ bib | doi ]
[109] Joseph S. Kong, P. Oscar Boykin, Behnam A. Rezaei, Nima Sarshar, and Vwani P. Roychowdhury. Let your CyberAlter ego share information and manage spam. Preprint, 2005.
[ bib | www ]
Almost all of us have multiple cyberspace identities, and these cyberalter egos are networked together to form a vast cyberspace social network. This network is distinct from the world-wide-web (WWW), which is being queried and mined to the tune of billions of dollars everyday, and until recently, has gone largely unexplored. Empirically, the cyberspace social networks have been found to possess many of the same complex features that characterize its real counterparts, including scale-free degree distributions, low diameter, and extensive connectivity. We show that these topological features make the latent networks particularly suitable for explorations and management via local-only messaging protocols. Cyberalter egos can communicate via their direct links (i.e., using only their own address books) and set up a highly decentralized and scalable message passing network that can allow large-scale sharing of information and data. As one particular example of such collaborative systems, we provide a design of a spam filtering system, and our large-scale simulations show that the system achieves a spam detection rate close to 100 around zero. This system has several advantages over other recent proposals (i) It uses an already existing network, created by the same social dynamics that govern our daily lives, and no dedicated peer-to-peer (P2P) systems or centralized server-based systems need be constructed; (ii) It utilizes a percolation search algorithm that makes the query-generated traffic scalable; (iii) The network has a built in trust system (just as in social networks) that can be used to thwart malicious attacks; iv) It can be implemented right now as a plugin to popular email programs, such as MS Outlook, Eudora, and Sendmail.
Keywords: Physics and Society; Disordered Systems and Neural Networks; Computers and Society; Networking and Internet Architecture
[110] Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl. GroupLens: applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77-87, 1997.
[ bib | doi ]
[111] Wessel Kraaij, Thijs Westerveld, and Djoerd Hiemstra. The importance of prior probabilities for entry page search. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 27-34, New York, NY, USA, 2002. ACM Press.
[ bib | doi | www ]
[112] Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the web for emerging cyber-communities. In WWW '99: Proceeding of the eighth international conference on World Wide Web, pages 1481-1493, New York, NY, USA, 1999. Elsevier North-Holland, Inc.
[ bib | doi | www ]
[113] F. W. Lancaster. Information Retrieval Systems: Characteristics, Testing, and Evaluation. Wiley, New York, 1968.
[ bib ]
[114] Amy N. Langville and Carl D. Meyer. A survey of eigenvector methods for web information retrieval. SIAM Review, 47(1):135-161, February 2005.
[ bib | www ]
[115] Mark Lawson. Berners-Lee on the read/write web. broadcast by Newsnight on BBC Two, August 2005. Interview with Tim Berners-Lee.
[ bib | www ]
[116] Joon Ho Lee. Analyses of multiple evidence combination. SIGIR Forum, 31(SI):267-276, 1997.
[ bib | doi ]
Keywords: data fusion
[117] S. Lehmann, B. Lautrup, and A. D. Jackson. Citation networks in high energy physics. Physical Review E, 68(2 Partt 2):026113, August 2003.
[ bib | doi | www ]
The citation network constituted by the SPIRES database is investigated empirically. The probability that a given paper in the SPIRES database has k citations is well described by simple power laws, P(k) proportional to k(-alpha), with alpha approximately 1.2 for k less than 50 citations and alpha approximately 2.3 for 50 or more citations. A consideration of citation distribution by subfield shows that the citation patterns of high energy physics form a remarkably homogeneous network. Further, we utilize the knowledge of the citation distributions to demonstrate the extreme improbability that the citation records of selected individuals and institutions have been obtained by a random draw on the resulting distribution.
[118] Thomas W. Malone, Kenneth R. Grant, Franklyn A. Turbak, Stephen A. Brobst, and Michael D. Cohen. Intelligent information-sharing systems. Commununications of the ACM, 30(5):390-402, 1987.
[ bib | doi ]
[119] Udi Manber. Foreword. In William B. Frakes and Ricardo Baeza-Yates, editors, Information Retrieval. Data Structures & Algorithms, pages v-vi. Prentice Hall, 1992.
[ bib ]
[120] Larry Masinter and Erik Ostrom. Collaborative information retrieval: Gopher from MOO. In Proceedings of INET '93, 1993.
[ bib | www ]
There are two visions of how use of the global network will evolve in the future First individuals will use the network as a resource providing access to material from libraries and other suppliers of information and entertainment Second in addition to communicating with these data sources people will communicate with each other using a variety of interactive text audio and video conferencing methods This paper is about a system that combines the two uses adding an information retrieval tool Gopher to a text based virtual reality environment MOO The combination allows informal collaboration using information retrieval to happen across the network
[121] David Mattox, Mark T. Maybury, and Daryl Morey. Enterprise expert and knowledge discovery. In Proceedings of the HCI International '99 (the 8th International Conference on Human-Computer Interaction), pages 303-307, Mahwah, NJ, USA, 1999. Lawrence Erlbaum Associates, Inc.
[ bib | www ]
Keywords: expert location, expert finding
[122] Mark Maybury, Ray D'Amore, and David House. Expert finding for collaborative virtual environments. Commun. ACM, 44(12):55-56, 2001.
[ bib | doi ]
Keywords: expert finding, expert location
[123] Sean M. McNee, Istvan Albert, Dan Cosley, Prateep Gopalkrishnan, Shyong K. Lam, Al Mamunur Rashid, Joseph A. Konstan, and John Riedl. On the recommending of citations for research papers. In CSCW '02: Proceedings of the 2002 ACM conference on Computer supported cooperative work, pages 116-125, New York, NY, USA, 2002. ACM Press.
[ bib | doi ]
[124] Filippo Menczer, Santo Fortunato, Alessandro Flammini, and Alessandro Vespignani. Googlearchy or Googlocracy? IEEE Spectrum, February 2006.
[ bib | www ]
[125] Stanley Milgram. The small-world problem. Psychology Today, 2:60-67, 1967.
[ bib ]
[126] David R. H. Miller, Tim Leek, and Richard M. Schwartz. A hidden markov model information retrieval system. In SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pages 214-221, New York, NY, USA, 1999. ACM Press.
[ bib | doi ]
[127] Nasim Ulrike Nadjafi. Virtuelle Gemeinschaften. Organisation - Interaktion. Master's thesis, Ludwig-Maximilians-Universität München, 1998.
[ bib ]
[128] M. E. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2):404-409, January 2001.
[ bib | doi | www ]
The structure of scientific collaboration networks is investigated. Two scientists are considered connected if they have authored a paper together and explicit networks of such connections are constructed by using data drawn from a number of databases, including MEDLINE (biomedical research), the Los Alamos e-Print Archive (physics), and NCSTRL (computer science). I show that these collaboration networks form small worlds, in which randomly chosen pairs of scientists are typically separated by only a short path of intermediate acquaintances. I further give results for mean and distribution of numbers of collaborators of authors, demonstrate the presence of clustering in the networks, and highlight a number of apparent differences in the patterns of collaboration between the fields studied.
Keywords: Authorship, Bibliographic, Bibliometrics, Cluster Analysis, Cooperative Behavior, Databases, Humans, MEDLINE, Models, Non-P.H.S., Non-U.S. Gov't, Research, Research Personnel, Research Support, Science, Theoretical, U.S. Gov't, 11149952
[129] M. E. Newman. Clustering and preferential attachment in growing networks. Physical Review E (Statistics, Nonlinear, and Soft Matter Physics), 64(2 Pt 2):025102, Aug 2001.
[ bib | doi | www ]
We study empirically the time evolution of scientific collaboration networks in physics and biology. In these networks, two scientists are considered connected if they have coauthored one or more papers together. We show that the probability of a pair of scientists collaborating increases with the number of other collaborators they have in common, and that the probability of a particular scientist acquiring new collaborators increases with the number of his or her past collaborators. These results provide experimental evidence in favor of previously conjectured mechanisms for clustering and power-law degree distributions in networks.
[130] M. E. J. Newman. Power laws, pareto distributions and zipf's law. Contemporary Physics, 46(5):323-351, September 2005.
[ bib | doi | www ]
When the probability of measuring a particular value of some quantity varies inversely as a power of that value, the quantity is said to follow a power law, also known variously as Zipf's law or the Pareto distribution. Power laws appear widely in physics, biology, earth and planetary sciences, economics and finance, computer science, demography and the social sciences. For instance, the distributions of the sizes of cities, earthquakes, solar flares, moon craters, wars and people's personal fortunes all appear to follow power laws. The origin of power-law behaviour has been a topic of debate in the scientific community for more than a century. Here we review some of the empirical evidence for the existence of power-law forms and the theories proposed to explain them.
[131] M. E. J. Newman. Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci U S A, 101 Suppl 1:5200-5, Apr 2004.
[ bib | doi | www ]
By using data from three bibliographic databases in biology, physics, and mathematics, respectively, networks are constructed in which the nodes are scientists, and two scientists are connected if they have coauthored a paper. We use these networks to answer a broad variety of questions about collaboration patterns, such as the numbers of papers authors write, how many people they write them with, what the typical distance between scientists is through the network, and how patterns of collaboration vary between subjects and over time. We also summarize a number of recent results by other authors on coauthorship patterns.
Keywords: Authorship, Models, Neural Networks (Computer), Non-P.H.S., Non-U.S. Gov't, Periodicals, Research Support, Statistical, U.S. Gov't, 14745042
[132] M. E. J. Newman. Properties of highly clustered networks. Physical Review E, 68:026121, August 2003.
[ bib | www ]
We propose and solve exactly a model of a network that has both a tunable degree distribution and a tunable clustering coefficient. Among other things, our results indicate that increased clustering leads to a decrease in the size of the giant component of the network. We also study susceptible/infective/recovered type epidemic processes within the model and find that clustering decreases the size of epidemics, but also decreases the epidemic threshold, making it easier for diseases to spread. In addition, clustering causes epidemics to saturate sooner, meaning that they infect a near-maximal fraction of the network for quite low transmission rates.
[133] M. E. J. Newman. Random graphs as models of networks. Working Papers 02-02-005, Santa Fe Institute, February 2002.
[ bib | www ]
[134] M. E. J. Newman and M. Girvan. Mixing patterns and community structure in networks. In Romualdo Pastor-Satorras, Miguel Rubi, and Albert Diaz-Guilera, editors, Statistical Mechanics of Complex Networks, volume 625 of Lecture Notes in Physics, pages 66-87. Springer, January 2003.
[ bib | www ]
[135] M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys Rev E Stat Nonlin Soft Matter Phys, 69(2 Pt 2):026113, Feb 2004.
[ bib ]
We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible betweenness measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.
[136] M. E. J. Newman and Juyong Park. Why social networks are different from other types of networks. Physical Review E, 68:036122, September 2003.
[ bib | doi | www ]
Keywords: social networks
[137] M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Physical Review E, 64:026118, 2001.
[ bib | doi | www ]
[138] M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models of social networks. Proc Natl Acad Sci U S A, 99 Suppl. 1:2566-2572, February 2002.
[ bib | doi | www ]
We describe some new exactly solvable models of the structure of social networks, based on random graphs with arbitrary degree distributions. We give models both for simple unipartite networks, such as acquaintance networks, and bipartite networks, such as affiliation networks. We compare the predictions of our models to data for a number of real-world social networks and find that in some cases, the models are in remarkable agreement with the data, whereas in others the agreement is poorer, perhaps indicating the presence of additional social structure in the network that is not captured by the random graph.
Keywords: Biological, Humans, Models, Non-P.H.S., Non-U.S. Gov't, Research Support, Social Support, U.S. Gov't, 11875211
[139] Andrew Y. Ng, Alice X. Zheng, and Michael I. Jordan. Stable algorithms for link analysis. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 258-266, New York, NY, USA, 2001. ACM Press.
[ bib | doi ]
[140] Klaus North, Kai Romhardt, and Gilbert Probst. Wissensgemeinschaften. Keimzellen lebendigen Wissensmanagements. io management, 7/8:52-62, 2000.
[ bib | www ]
[141] Esko Nuutila and Eljas Soisalon-Soininen. On finding the strongly connected components in a directed graph. Information Processing Letters, 49:9-14, 1993.
[ bib | www ]
[142] Joshua O'Madadhain, Danyel Fisher, Padhraic Smyth, Scott White, and Yan-Biao Boey. Analysis and visualization of network data using JUNG. Journal of Statistical Software, 2005. To appear.
[ bib | www ]
The JUNG (Java Universal Network/Graph) Framework is a free, open-source software library that provides a common and extendible language for the manipulation, analysis, and visualization of data that can be represented as a graph or network. It is written in the Java programming language, allowing JUNG-based applications to make use of the extensive built-in capabilities of the Java Application Programming Interface (API), as well as those of other existing third-party Java libraries. We describe the design, and some details of the implementation, of the JUNG architecture, and provide illustrative examples of its use.
[143] Douglas W. Oard. The state of the art in text filtering. User Modeling and User-Adapted Interaction, 7(3):141-178, 1997.
[ bib | doi | www ]
This paper develops a conceptual framework for text filtering practice and research, and reviews present practice in the field. Text filtering is an information seeking process in which documents are selected from a dynamic text stream to satisfy a relatively stable and specific information need. A model of the information seeking process is introduced and specialized to define text filtering. The historical development of text filtering is then reviewed and case studies of recent work are used to highlight important design characteristics of modern text filtering systems. User modeling techniques drawn frominformation retrieval, recommender systems,machine learning and other fields are described. The paper concludes with observations on the present state of the art and implications for future research on text filtering.
Keywords: Information filtering, Text retrieval, Social filtering, Collaborative, Content-based, Selective Dissemination of Information, Current awareness, Recommender systems
[144] Paul Ogilvie and Jamie Callan. Combining document representations for known-item search. In SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 143-150, New York, NY, USA, 2003. ACM Press.
[ bib | doi ]
[145] Lawrence Page. Method for node ranking in a linked database. U.S. Patent 6,285,999, September 2001. Assignee: The Board of Trustees of the Leland Stanford Junior University (Stanford, CA).
[ bib | www ]
A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database. The rank assigned to a document is calculated from the ranks of documents citing it. In addition, the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document. The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality.
[146] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University, November 1999.
[ bib | www ]
The importance of a Web page is an inherently subjective matter, which depends on the readers interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Web surfer. We show how to efficiently compute PageRank for large numbers of pages. And, we show how to apply PageRank to search and to user navigation.
[147] Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using PageRank to characterize web structure. In COCOON '02: Proceedings of the 8th Annual International Conference on Computing and Combinatorics, number 2387 in Lecture Notes in Computer Science, pages 330-339, London, UK, 2002. Springer-Verlag.
[ bib ]
[148] Gopal Pandurangan, Prabhakar Raghavan, and Eli Upfal. Using pagerank to characterize web structure. In COCOON '02: Proceedings of the 8th Annual International Conference on Computing and Combinatorics, pages 330-339, London, UK, 2002. Springer-Verlag.
[ bib | www ]
[149] Alan J. Perlis. Epigrams in programming. ACM SIGPLAN Notices, 17(9):7-13, September 1982.
[ bib | www ]
[150] Gabriel Pinski and Francis Narin. Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics. Information Processing and Management, 12(5):297-312, 1976.
[ bib | doi ]
A self-consistent methodology is developed for determining citation based influence measures for scientific journals, subfields and fields. Starting with the cross citing matrix between journals or between aggregates of journals, an eigenvalue problem is formulated leading to a size independent influence weight for each journal or aggregate. Two other measures, the influence per publication and the total influence are then defined. Hierarchical influence diagrams and numerical data are presented to display journal interrelationships for journals within the subfields of physics. A wide range in influence is found between the most influential and least influential or peripheral journals.
[151] Peter Pirolli, James Pitkow, and Ramana Rao. Silk from a sow's ear: extracting usable structures from the Web. In CHI '96: Proceedings of the SIGCHI conference on Human factors in computing systems, pages 118-125, New York, NY, USA, 1996. ACM Press.
[ bib | doi | www ]
In its current implementation, the World-Wide Web lacks much of the explicit structure and strong typing found in many closed hypertext systems. While this property probably relates to the explosive acceptance of the Web, it further complicates the already difficult problem of identifying usable structures and aggregates in large hypertext collections. These reduced structures, or localities, form the basis for simplifying visualizations of and navigation through complex hypertext systems. Much of the previous research into identifying aggregates utilize graph theoretic algorithms based upon structural topology, i.e., the linkages between items. Other research has focused on content analysis to form document collections. This paper presents our exploration into techniques that utilize both the topology and textual similarity between items as well as usage data collected by servers and page meta-information lke title and size. Linear equations and spreading activation models are employed to arrange Web pages based upon functional categories, node types, and relevancy.
Keywords: Information Visualization, World Wide Web, Hypertext, spreading activation
[152] James Pitkow, Hinrich Schütze, Todd Cass, Rob Cooley, Don Turnbull, Andy Edmonds, Eytan Adar, and Thomas Breuel. Personalized search. Commununications of the ACM, 45(9):50-55, 2002.
[ bib | doi ]
[153] Jay M. Ponte and W. Bruce Croft. A language modeling approach to information retrieval. In SIGIR '98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 275-281, New York, NY, USA, 1998. ACM Press.
[ bib | doi ]
[154] Scott Everett Preece. A Spreading Activation Network Model for Information Retrieval. PhD thesis, University of Illinois at Urbana-Champaign, 1981.
[ bib | www ]
[155] M. Ross Quillian. Semantic memory. In Marvin Minsky, editor, Semantic Information Processing. MIT Press, Cambridge, Mass., 1968.
[ bib ]
[156] M. Ross Quillian. The teachable language comprehender: a simulation program and theory of language. Commun. ACM, 12(8):459-476, 1969.
[ bib | doi ]
[157] Paul Resnick, Neophytos Iacovou, Mitesh Suchak, Peter Bergstrom, and John Riedl. GroupLens: an open architecture for collaborative filtering of netnews. In CSCW '94: Proceedings of the 1994 ACM conference on Computer supported cooperative work, pages 175-186, New York, NY, USA, 1994. ACM Press.
[ bib | doi ]
[158] C. J. van Rijsbergen. Information Retrieval. Butterworths, 1979.
[ bib | www ]
[159] S. E. Robertson, C. J. van Rijsbergen, and M. F. Porter. Probabilistic models of indexing and searching. In SIGIR '80: Proceedings of the 3rd annual ACM conference on Research and development in information retrieval, pages 35-56, Kent, UK, 1981. Butterworth & Co.
[ bib ]
[160] S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR '94: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232-241, New York, NY, USA, 1994. Springer-Verlag New York, Inc.
[ bib ]
[161] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In D. K. Harman, editor, Proceceedings of the Third Text REtrieval Conference (TREC-3), number 500-226 in NIST Special Publications, pages 109-126. U.S. National Institute of Standards and Technology (NIST), 1995.
[ bib | www ]
Keywords: BM25
[162] S.E. Robertson, H. Zaragoza, and M. Taylor. Simple BM25 extension to multiple weighted fields. In Thirteenth Conference on Information and Knowledge Management (CIKM), 2004.
[ bib | www ]
[163] Nicholas C. Romano, Jr, Dmitri Roussinov, Jay F. Nunamaker, Jr, and Hsinchun Chen. Collaborative information retrieval environment: Integration of information retrieval with group support systems. In HICSS '99: Proceedings of the Thirty-Second Annual Hawaii International Conference on System Sciences-Volume 1, pages 1053-1062, Washington, DC, USA, 1999. IEEE Computer Society.
[ bib ]
Observations of Information Retrieval (IR) system user experiences reveal strong desires for collaborative search efforts; however the same user experiences suggest that collaborative capabilities are rarely, and then only in a limited fashion, supported by current tools for searching and visualizing query results. Equally interesting is the fact that observations of user experiences with Group Support Systems (GSS) reveal that access to external information and the ability to search for relevant material is often vital to the progress of GSS sessions, however these same user experiences suggest that integrated support for collaborative searching and visualization of results is lacking in GSS systems. After reviewing user experiences described in both IR and GSS literature and observing and interviewing users of existing IR and GSS commercial and prototype systems, the Author's conclude that the demand for systems supporting multi-user IR is obvious. It is surprising to the Authors that very little attention has been given to the common ground shared by these two important research domains. With this in mind, Our paper describes how user experiences with IR and GSS systems has shed light onto a promising new area of collaborative research and led to the development of a prototype that merges the two paradigms into a Collaborative Information Retrieval Environment (CIRE). Finally the paper presents theory developed from initial user experiences with our prototype and describes our plans to empirically test the efficacy of this new paradigm through controlled experimentation.
[164] Robert W. Root. Design of a multi-media vehicle for social browsing. In CSCW '88: Proceedings of the 1988 ACM conference on Computer-supported cooperative work, pages 25-38, New York, NY, USA, 1988. ACM Press.
[ bib | doi ]
In this paper we present a new approach to the use of computer-mediated communications technology to support distributed cooperative work. In contrast to most of the existing approaches to CSCW, we focus explicitly on tools to enable unplanned, informal social interaction. We describe a ?social interface? which provides direct, low-cost access to other people through the use of multi-media communications channels. The design of the system centers around three basic concepts derived from the research literature and our own observations of the workplace: social browsing, a virtual workplace, and interaction protocols. We use these design properties to describe a new system concept, and examine the implications for CSCW of having automated social interaction available through the desktop workstation.
[165] Thorsten Ruhl. Personal Search Memory - Design und Realisierung einer Suchschnittstelle zur kombinierten Suche in früheren und neuen Suchergebnissen. Diploma thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, 2003.
[ bib ]
Die Zielsetzung der vorliegenden Arbeit ist das Design und die Realisierung einer Suchschnittstelle, mit der eine kombinierte Suche in früheren und neuen Suchergebnissen möglich ist. Das Suchinterface soll dabei dem Benutzer die Möglichkeit geben, durch das Setzen und Speichern von Assoziationen zwischen Suchanfragen und Suchergebnissen, sein persönliches externes Suchgedächtnis aufzubauen. Ferner sollen alle Suchanfragen mitprotokolliert und die Erweiterungen und Modifikationen der Anfragen entsprechend den Vorstellungen von Bush [Bush, 1945] in Suchpfaden abgelegt werden. Der Bedarf an einem personalisierten Suchinterface wird dadurch bekräftigt, dass die konventionellen Suchmaschinen ihre Ergebnislisten unabhängig vom Benutzer erzeugen und viele irrelevante Treffer zurückliefern. Außerdem erweist sich die Rückkehr zu bekannten Dokumenten nicht immer als einfach. Die Grundlage für das Suchinterfacedesign bildet die Festlegung der Anwendungsfälle und die Analyse der Anforderungen an das System. Es wurden dabei folgende vier Anwendungsfälle festgelegt: 1) Mit dem Suchinterface soll nach Informationen gesucht werden können, 2) die Webdokumente der Suchergebnisse sollen angezeigt werden können, 3) die Suchergebnisse sollen bewertet werden können und 4) die Daten der aufgenommenen Anfrage-Resultat-Assoziationen sollen geändert werden können. Zu Beginn wurde eine Anforderungsanalyse an einigen Probanden durchgeführt. Allgemein kann gesagt werden, dass die Probanden eine einfache und schnelle Installation des Suchinterfaces, keine Beschränkung des Benutzers auf Browser oder Plattform und eine natürliche und einfache Bedienung fordern. Darüber hinaus hatte jeder Proband eine andere Vorstellung darüber, was das Suchinterface nach seinen Wünschen nach leisten sollte. Im späteren Design des Suchinterfaces wurde eine Auswahl der Benutzerwünsche getroffen, die in dem Prototyp realisiert wurden. Anhand der festgelegten Anwendungsfälle wurde das Konzept des Suchinterfaces entworfen, das aus mehreren Komponenten besteht. Zu diesen Komponenten gehören z.B. die personalisierte Suchmaschine, die nach früheren Resultaten im persönlichen Suchgedächtnis des Benutzers sucht, und die Datenbank, die das Suchgedächtnis des Benutzers darstellt. Diese Komponenten wurden näher untersucht und Empfehlungen für die Realisierung im Prototyp gegeben. Für die personalisierte Suche wurden neue Rankingalgorithmen entwickelt und neue Relevanzwerte definiert, die die Daten des persönlichen Suchgedächtnisses nutzen. Eine weitere Komponente des Konzepts ist die Visualisierung von Suchergebnissen. Hier wurde für die kombinierte Suche eine neue Darstellungsform auf der Metapher der Mengen entworfen und einer kritischen Bewertung unterzogen. Im Vergleich mit Konzepten ähnlicher Systeme konnte unter anderem festgestellt werden, dass diese sich beim Setzen eines Lesezeichens auf eine Webseite nicht zusätzlich das zugehörige Informationsbedürfnis merken und somit nicht dem Ansatz von Anfrage-Resultat-Assoziationen folgen. Ferner hat keines der Systeme eine Suchhistorie, in der der Benutzer einen strukturierten Verlauf seiner Suchanfragen sehen kann. Auf der Basis der Anwendungsfälle und unter der Forderung, dass sich das Suchinterface ohne zusätzlichen Arbeitsaufwand in den täglichen Suchprozess des Benutzers integrieren lässt, wurde das entwickelte Suchinterface-Konzept als Prototyp realisiert. Ferner wurde darauf geachtet, dass sich das Suchinterface in einen beliebigen Webbrowser integrieren lässt und Plattformunabhängigkeit bietet. Die Evaluation des realisierten Prototyps rundet die Arbeit ab. Sie wurde mit der Thinking-aloud Methode an Testpersonen mit verschiedenen Fachkenntnissen im Bereich Suchdienste und Recherchesystemen durchgeführt.
[166] Gerard Salton. Associative document retrieval techniques using bibliographic information. Journal of the ACM, 10(4):440-457, 1963.
[ bib | doi ]
[167] Gerard Salton and Chris Buckley. On the use of spreading activation methods in automatic information retrieval. In Proceedings of the ACM SIGIR, Grenoble, France, June 1988.
[ bib | www ]
Spreading activation methods have been recommended in information retrieval to expand the search vocabulary and to complement the retrieved document sets. The spreading activation strategy is reminiscent of earlier associative indexing and retrieval systems. Some spreading activation procedures briefly described, and evaluation output is given, reflecting the effectiveness of one of the proposed procedures.
Keywords: spreading activation
[168] Gerard Salton and Chris Buckley. Term-weighting approaches in automatic information retrieval. Information Processing and Management, 24(5):513-523, 1988.
[ bib ]
[169] Michael F. Schwartz and David C. M. Wood. Discovering shared interests using graph analysis. Commun. ACM, 36(8):78-89, 1993.
[ bib | doi ]
Keywords: expert finding, expert location, graph analysis
[170] John Scott. Social Network Analysis. A Handbook. SAGE Publications, London, 1991.
[ bib ]
[171] Mehul Shah. ReferralWeb: A resource location system guided by personal relations. Master's thesis, Massachusetts Institute of Technology, May 1997.
[ bib ]
[172] Craig Silverstein, Hannes Marais, Monika Henzinger, and Michael Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6-12, 1999.
[ bib | doi ]
[173] Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21-29, New York, NY, USA, 1996. ACM Press.
[ bib | doi ]
[174] Alan F. Smeaton, Gary Keogh, Cathal Gurrin, Kieran McDonald, and Tom Sødring. Analysis of papers from twenty-five years of SIGIR conferences: What have we been doing for the last quarter of a century? SIGIR Forum, 36(2):39-43, 2002.
[ bib | doi | www ]
As part of the celebration of twenty-five years of ACM SIGIR conferences we performed a content analysis of all papers published in the proceedings of SIGIR conferences, including those from 2002. From this we determined, using information retrieval approaches of course, which topics had come and gone over the last two and a half decades, and which topics are currently ?hot?. We also performed a co-authorship analysis among authors of the 853 SIGIR conference papers to determine which author is the most ?central? in terms of a co-authorship graph and is our equivalent of Paul Erdös in Mathematics. In the first section we report on the content analysis, leading to our prediction as to the most topical paper likely to appear at SIGIR2003. In the second section we present details of our co-authorship analysis, revealing who is the ?Christopher Lee? of SIGIR, and in the final section we give pointers to where readers who are SIGIR conference paper authors may find details of where they fit into the coauthorship graph.
[175] Steven H. Strogatz. Exploring complex networks. Nature, 410:268-276, March 2001.
[ bib | doi ]
[176] Theodore Sturgeon. Venture Science Fiction, March 1958.
[ bib ]
[177] H. Turtle and W. B. Croft. Inference networks for document retrieval. In SIGIR '90: Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, pages 1-24, New York, NY, USA, 1990. ACM Press.
[ bib | doi ]
[178] Joshua R. Tyler, Dennis M. Wilkinson, and Bernardo A. Huberman. Email as spectroscopy: automated discovery of community structure within organizations. In Proceedings of the Communities and Technologies (C&T 2003) International Conference, pages 81-96, Deventer, The Netherlands, The Netherlands, 2003. Kluwer, B.V.
[ bib ]
[179] Trystan Upstill, Nick Craswell, and David Hawking. Query-independent evidence in home page finding. ACM Trans. Inf. Syst., 21(3):286-313, 2003.
[ bib | doi ]
[180] Ellen M. Voorhees, Narendra K. Gupta, and Ben Johnson-Laird. Learning collection fusion strategies. In SIGIR '95: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, pages 172-179, New York, NY, USA, 1995. ACM Press.
[ bib | doi ]
Keywords: data fusion
[181] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of `small-world' networks. Nature, 393:440-442, June 1998.
[ bib | www ]
[182] Aaron Weiss. The power of collective intelligence. netWorker, 9(3):16-23, 2005.
[ bib | doi ]
[183] Etienne Wenger. Communities of practice: Learning as a social system. Systems Thinker, 9(5), 1998.
[ bib | www ]
[184] Etienne Wenger. How we learn. Communities of practice. The social fabric of a learning organization. Healthcare Forum Journal, 39(4):20-26, 1996.
[ bib | www ]
Keywords: Education, Continuing, Humans, Learning, Models, Educational, Nurse-Patient Relations, Organizational Culture, Problem Solving, Social Environment, United States, 10158755
[185] Thijs Westerveld, Wessel Kraaij, and Djoerd Hiemstra. Retrieving web pages using content, links, urls and anchors. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Tenth Text REtrieval Conference TREC-10, number 500-250 in NIST Special Publications, pages 663-672. U.S. National Institute of Standards and Technology (NIST), 2001.
[ bib | www ]
[186] Scott White and Padhraic Smyth. Algorithms for estimating relative importance in networks. In KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 266-275, New York, NY, USA, 2003. ACM Press.
[ bib | doi ]
[187] Richard C. Wilson, Edwin R. Hancock, and Bin Luo. Pattern vectors from algebraic graph theory. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1112-1124, July 2005.
[ bib | doi ]
Keywords: Graph matching, graph features, spectral methods
[188] T. D. Wilson. Information needs and uses: fifty years of progress. In B. C. Vickery, editor, Fifty years of information progress: a Journal of Documentation review, pages 15-51. Aslib, London, 1994.
[ bib | www ]
[189] T. D. Wilson. On user studies and information needs. Journal of Librarianship, 37(1):3-15, 1981.
[ bib | www ]
[190] Fang Wu, Bernardo A. Huberman, Lada A. Adamic, and Joshua R. Tyler. Information flow in social groups. Physica A, 337:327-335, 2004.
[ bib | www ]
[191] Jinxi Xu and W. Bruce Croft. Query expansion using local and global document analysis. In SIGIR '96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 4-11, New York, NY, USA, 1996. ACM Press.
[ bib | doi ]
Automatic query expansion has long been suggested as a technique for dealing with the fundamental issue of word mismatch in information retrieval. A number of approaches to ezpanrnion have been studied and, more recently, attention has focused on techniques that analyze the corpus to discover word relationship (global techniques) and those that analyze documents retrieved by the initial quer  ( local feedback). In this paper, we compare the effectiveness of these approaches and show that, although global analysis haa some advantages, local analysia is generally more effective. We also show that using global analysis techniques, such as word contezt and phrase structure, on the local aet of documents produces results that are both more effective and more predictable than simple local feedback.
[192] Atsushi Yamamoto, Daisuke Asahara, Tomoko Itao, Satoshi Tanaka, and Tatsuya Suda. Distributed pagerank: A distributed reputation model for open peer-to-peer networks. In SAINT-W '04: Proceedings of the 2004 Symposium on Applications and the Internet-Workshops (SAINT 2004 Workshops), page 389, Washington, DC, USA, 2004. IEEE Computer Society.
[ bib ]
This paper proposes a distributed reputation model for open peer-to-peer networks called distributed pagerank. This model is motivated by the observation that although pagerank has already satisfied the requirements of reputation models, the centralized calculation of pagerank is incompatible with peer-to-peer networks. Distributed pagerank is a decentralized approach for calculating the pagerank of each peer by its reputation, in which the relationship between peers is introduced as the equivalent to the link between web pages. The distributed calculation of pagerank is performed asynchronously by each peer as it communicates with the other peers. The asynchronous calculation accomplishes both demanding no extra messages for the calculation of pagerank and steadily calculating an accurate pagerank of each peer even under the dynamic topology of relationships. The result of the simulation has indicated that the calculated pagerank value of each peer converges at the original pagerank value under the static topology of relationships, which is presumable under a dynamic topology. A fully implemented application of distributed pagerank has also been presented, which supports dynamic formation of communities with reputation ranking.
[193] Dawit Yimam-Seid and Alfred Kobsa. Expert finding systems for organizations: Problem and domain analysis and the DEMOIR approach. Journal of Organizational Computing and Electronic Commerce, 13(1):1-24, 2003.
[ bib ]
Computer systems that augment the process of finding the right expert for a given problem in an organization or world-wide are becoming feasible more than ever before, thanks to the prevalence of corporate Intranets and the Internet. This paper investigates such systems in two parts. We first explore the expert finding problem in depth, review and analyze existing systems in this domain, and suggest a domain model that can serve as a framework for design and development decisions. Based on our analyses of the problem and solution spaces, we then bring to light the gaps that remain to be addressed. Finally, we present our approach called DEMOIR, which is a modular architecture for expert finding systems that is based on a centralized expertise modeling server while also incorporating decentralized components for expertise information gathering and exploitation.
Keywords: expert finding, expert location
[194] Bin Yu and Munindar P. Singh. Searching social networks. In AAMAS '03: Proceedings of the second international joint conference on Autonomous agents and multiagent systems, pages 65-72, New York, NY, USA, 2003. ACM Press.
[ bib | doi | www ]
A referral system is a multiagent system whose member agents are capable of giving and following referrals. The specific cases of interest arise where each agent has a user. The agents cooperate by giving and taking referrals so each can better help its user locate relevant information. This use of referrals mimics human interactions and can potentially lead to greater effectiveness and efficiency than in single-agent systems. Existing approaches consider what referrals may be given and treat the referring process simply as path search in a static graph. By contrast, the present approach understands referrals as arising in and influencing dynamic social networks, where the agents act autonomously based on local knowledge. This paper studies strategies using which agents may search dynamic social networks. It evaluates the proposed approach empirically for a community of AI scientists (partially derived from bibliographic data). Further, it presents a prototype system that assists users in finding other users in practical social networks.
Keywords: referrrals
[195] Xiangmin Zhang and Yuelin Li. An exploratory study on knowledge sharing in information retrieval. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05), page 245pp. Computer Society Press, 2005.
[ bib | doi | www ]
[196] Dublin Core Metadata Initiative. DCMI Metadata Terms. http://dublincore.org/documents/dcmi-terms/, 2005.
[ bib ]
[197] Six Apart Ltd. LiveJournal bot policy. http://www.livejournal.com/bots/, January 2006.
[ bib ]
[198] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi, and Peter F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003.
[ bib ]
[199] D. G. Bobrow and A. Collins, editors. Representation and Understanding. Academic Press, New York, 1975.
[ bib ]
[200] R. Brachman and H. Levesque, editors. Readings in Knowledge Representation. Morgan Kaufman, Los Altos, 1985.
[ bib ]
[201] William B. Frakes and Ricardo Baeza-Yates, editors. Information Retrieval. Data Structures & Algorithms. Prentice Hall, 1992.
[ bib ]
[202] Christopher Lueg and Danyel Fisher, editors. From Usenet to CoWebs. Interacting with social information spaces. Springer, 2003.
[ bib ]
[203] Gerard Salton, editor. The Smart Retrieval System. Experiments in Automatic Document Processing. Prentice Hall Inc., Englewood Cliffs NJ, 1971.
[ bib ]
[204] Beth Sundheim and Ralph Grishman, editors. MUC6 '95: Proceedings of the 6th conference on Message understanding, Morristown, NJ, USA, 1995. Association for Computational Linguistics.
[ bib ]
[205] E. M. Voorhees and Lori P. Buckland, editors. Proceedings of the Thirteenth Text REtrieval Conference (TREC 2004), number 500-261 in NIST Special Publications, Gaithersburg, MD, November 2004. U. S. National Institute of Standards and Technology.
[ bib | www ]

This file has been generated by bibtex2html 1.74


Copyright © 1999--2004 Sebastian Marius Kirsch webmaster@sebastian-kirsch.org , all rights reserved.
Id: studies.wml,v 1.10 2004/08/10 09:13:40 skirsch Exp