Wednesday, February 12, 2020

Google's Dataset Search: access to 25 million free datasets

Dataset Search is a search engine for datasets available online. It is a  product of Google. Dataset Search (Beta version) was first launched in  2018. Now, it has come out of beta with improved quality of dataset descriptions and filter searches. It has indexed about 25 million datasets. It is a single place for searching datasets and also provides links to where the data is. Dataset search product has targeted mostly the Academic research scholars, Students, Business analysts, and Data scientists but not limited to them. It has almost covered all the subjects that interest you. The largest covered topics are Geoscience, Agriculture, and Biology.

Google's Dataset Search: access to 25 million free datasets


Pdf
Dataset Search is a search engine for datasets available online. It is a  product of Google. Dataset Search (Beta version) was first launched in  2018. Now, it has come out of beta with improved quality of dataset descriptions and filter searches. It has indexed about 25 million datasets. It is a single place for searching datasets and also provides links to where the data is. Dataset search product has targeted mostly the Academic research scholars, Students, Business analysts, and Data scientists but not limited to them. It has almost covered all the subjects that interest you. The largest covered topics are Geoscience, Agriculture, and Biology.

Features of Dataset Search

Based on the feedback received from the people who tried the beta version over the past year, it has added some new features.
  • Dataset search is now available on mobile also.
  • Filter search is available for the format of data in which you want the results like Table, Text, and image.
  • Filter search also works for whether the data is free from the provider.
  • Datasets' search results also cover maps in case of searching about geographical areas. 

Anybody who publishes their datasets online and wants to make their datasets discoverable in Dataset Search then Schema.org is an open standard to describe the properties of their datasets in a particular format on a web page.

Schema.org

To publish data on the Internet, web pages, email messages, it needs to be described in a structured format, so that it can be easily located and retrieved. A shared vocabulary is a collection of entities/ concepts ( real-world objects, people, places, events) and their semantic relationships and actions. Entities described in such a manner that their description is interlinked with each other, which helps the users in better search, navigation, retrieval of information and question answering. Schema.org is one such shared vocabulary to help webmasters and developers to build upon it the structure of their webpages and to get the maximum benefit of their efforts.

“Schema.org is a collaborative, community activity with a mission to create, maintain, and promote schemas for structured data on the Internet, on web pages, in email messages, and beyond. Founded by Google, Microsoft, Yahoo, and Yandex, Schema.org vocabularies are developed by an open community process, using the public-schemaorg@w3.org mailing list and through GitHub1

In the Context of Libraries

Libraries of scientific and research organizations have a vast collection of datasets but they use traditional search engines. By describing their datasets properties with Schema.org their datasets can become discoverable by Google Datasets Search Engine. This makes the data sets more accessible to researchers and Scientists. Some libraries across the world have datasets that are searchable by Dataset Search, for example:


Overall the Dataset Search has received positive responses from the scientific community. It has encouraged older institutions and organizations to publish with proper metadata standards to make their data discoverable on Dataset Search. This will make the change in the sense that, now the scientific data will become more accessible in the future.


Glossary

Beta version: A version of a piece of software that is made available for testing, typically by a limited number of users outside the company that is developing it, before its general release.

Dataset: A collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer.

Semantic Relationship: Any relationship between two or more words based on the meaning of the words.


References


To know more

1 comment:

Comments