Recently, there has been an explosion of publicly available datasets, as organizations and governments realize that having the data in the open is the key to and unlocking new markets and ideas.
Grok is a perfect platform for consuming that data, making sense out of it, and sharing the insights. Out-of-the-box, it comes with thousands of pre-built connections to public datasets with data on economics, climate, energy, finance, and hundreds of other topics.
OpenAPI, also known as swagger, is a popular format that describes the structure of the server APIs so that machines can read the document and use the service.
Datagrok platform integrates with OpenAPI really well. Once a swagger file is imported (you can simply drag-and-drop yaml file into the app), its content gets translated to Datagrok connections, queries, and functions. All of them can be combined and used in data jobs, calculations, info panels, etc.
There is a lot that can be done in Datagrok with OpenAPI except simply retrieving the data, and we encourage you to learn more about it.
The following data providers were integrated into Datagrok by simply importing their swagger file:
- Alpha Vantage
From wikipedia: Socrata develops and operates a government domain-specific, cloud-based data as a service platform that breaks down government data silos. This platform has the ability to ingest, store and serve all variety of public sector data workloads
- from small, static data to dynamic big data including real-time, sensor-based data emitted from internet of Things and smart city sensors and devices. The Socrata platform can store structured or unstructured operational, geospatial, financial and performance data and digital content like video footage.
Grok provides an easy way to discover, retrieve, and analyze any open dataset hosted on the Socrata platform. Look for available datasets under the Socrata connection in the 'Connect to Data' window.
Public databases hosted on grok
In addition to accessing external datasets, there are a number of publicly available databases that we mirror and host on our platform. Some example of these databases are:
- manually curated chemical database of bioactive molecules with drug-like properties
- is a benchmark dataset selected from PubChem BioAssay by applying a refined nearest neighbor analysis. The MUV dataset contains 17 challenging tasks for around 90,000 compounds and is specifically designed for validation of virtual screening techniques