Skip to main content

Google BigQuery

This page succinctly describes how to use Google BigQuery to analyze data from the MultiversX blockchain.

Overview

BigQuery is Google's fully managed, serverless data warehouse that enables analysis of extremely large datasets using SQL queries and / or visual tools (such as Google Looker Studio); it also has built-in machine learning capabilities.

MultiversX Blockchain data is published to Google BigQuery, and available (for free) through the Google Cloud Marketplace. The dataset, namely bigquery-public-data.crypto_multiversx_mainnet_eu, is one of many crypto datasets that are available within Google Cloud Public Datasets. One can query these datasets for free: up to 1TB / month of free processing, every month.

The MultiversX BigQuery dataset closely resembles the set of indices of the MultiversX Elasticsearch instance. Their schema and data are approximately equivalent, the data being mirrored from the Elasticsearch instance to BigQuery at regular intervals (most tables are updated hourly, and a few are updated every 4 hours).

note

As of February 2024, the MultiversX BigQuery dataset is not updated in real-time (see above). For real-time data, use the public APIs.

note

If you experience any issue with the published dataset, please let us know.

Query from BigQuery Studio

Google BigQuery Studio is a unified workspace for Google Cloud's data analytics suite which incorporates, among others, an SQL editor (optionally assisted by AI) and Python notebooks. It is a great way to explore the MultiversX dataset, and to run queries. Below, we'll explore a few example queries.

tip

Make sure to explore the dataset, the tables and their schema before running queries. Both the schema and a data preview are available in BigQuery Studio.

How many transactions were processed on MultiversX, in the last couple of days?

SELECT
DATE(`timestamp`) `day`,
COUNT(*) `transactions`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)
GROUP BY `day`
ORDER BY `day` DESC

Which were the top used Smart Contracts, in the last couple of days?

SELECT
DATE(`timestamp`) `day`,
`receiver` `contract`,
COUNT(DISTINCT `sender`) `num_users`,
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE `isScCall` = true
GROUP BY `day`, `contract`
HAVING `day` >= DATE_SUB(CURRENT_DATE(), INTERVAL 3 DAY) AND `num_users` > 1000
ORDER BY `day` DESC, `num_users` DESC

What ESDT tokens have the most holders?

SELECT
`token`,
`type`,
COUNT(_id) `num_holders`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.accountsesdt`
WHERE `type` = 'FungibleESDT' OR `type` = 'MetaESDT'
GROUP BY `token`, `type`
HAVING `num_holders` > 5000
ORDER BY `num_holders` DESC

What are the transactions with the largest transferred EGLD amounts, in the last couple of days?

SELECT
`day`,
`hash`,
`sender`,
`receiver`,
`amount`
FROM (
SELECT
DATE(`timestamp`) `day`,
`_id` `hash`,
`sender`,
`receiver`,
PARSE_BIGNUMERIC(`value`) `amount`,
ROW_NUMBER() OVER (PARTITION BY DATE(`timestamp`)
ORDER BY PARSE_BIGNUMERIC(`value`) DESC) AS `row_num`
FROM
`bigquery-public-data.crypto_multiversx_mainnet_eu.transactions`
WHERE
`status` = 'success'
AND DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) )
WHERE `row_num` = 1
ORDER BY `day` DESC
LIMIT 7;

What is the (global) network hitrate, per day, in the last month?

SELECT
DATE(`timestamp`) `day`,
-- 14400 is the number of rounds per day, and 3 + 1 = 4 is the number of shards
ROUND(COUNT(*) / (14400 * 4), 4) `hit_rate`
FROM `bigquery-public-data.crypto_multiversx_mainnet_eu.blocks`
WHERE
DATE(`timestamp`) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND DATE(`timestamp`) < CURRENT_DATE()
GROUP BY `day`
ORDER BY `day` DESC
note

Even if BigQuery includes a generous free tier, it is important to be mindful of the costs associated with running queries. For more information, see BigQuery pricing.

If you believe that specific optimizations can be applied to the dataset (to improve query performance), please let us know.

Analyze using Looker Studio

Google Looker Studio is a powerful tool for analyzing data and creating (shareable) reports. Out of the box, it connects to BigQuery (and many other data sources), thus it's a great way to explore the MultiversX dataset.

Example of report created in Looker Studio (leveraging the MultiversX dataset in BigQuery):

img

tip

In the BigQuery Studio, you can save the results of a given query as your own BigQuery tables, then immediately import them in Looker Studio, to create visualizations and reports.

Programmatic access

One can also query datasets programmatically, using the BigQuery client libraries.

See how to query a public dataset with the BigQuery client libraries.