Web Semantics in the Clouds
Source:
IEEE Intelligent Systems, Volume 23, Issue 5 (2008)
Abstract:
In the last two years, the amount of structured data made available on the Web in semantic formats has grown by several orders of magnitude.
On one side, the Semantic Web Linked Data effort has made available online hundreds of millions of RDF based entity descriptions in datasets like DBPedia, Uniprot, Geonames and several others. On the other hand, the Web 2.0 community has more and more embraced the idea of data portability and as today the first efforts have already produced billions of RDF equivalent triples embedded inside HTML pages using microformats, or directly exposed using eRDF and RDFa.
Incentives for exposing such data are also, finally, becoming clearer: Yahoo's Search Monkey, for example, makes Web sites containing structured data stand out from others by providing the most appropriate visualization for the end user in the search result page. It will not be long, we envision, before search engines will also directly use this information for ranking and relevance purposes, returning, for example, qualitative better results for queries which involve everyday entities such as events, locations and people.
Even though we are today still at the beginning of the data web era, the amount of information already available is clearly much larger than what could be contained, for example, in any current generation triplestore typically running on single servers.
While many applications will need to work with large amounts of
metadata, there is one particular application that would certainly not
exist without the capability of accessing and processing
\emph{arbitrary} amounts of metadata: the search engines that serve
other applications to locate the data and services they need. For this
reason Semantic Web search engines and large scale services are now the first breaking ground in harnessing
the power of grid computing when it comes to scaling far beyond the
current generation of triple stores.