Zetaris SQL Guide
CURRENTLY IN REVISION
The Zetaris SQL Guide will help you to write queries natively within the Zetaris SQL Editor, but also when connecting through external tools.
Data Source Connection and Registration Statements
DDL Statements (connected data sources and lakehouses)
DML Statements (connected data sources and lakehouses)
DCL Statements (for Zetaris)
Data Retrieval Statements (connected data sources and lakehouses)
Connection and Registration Statements
Connection and Registration statements are used to connect your data sources and lakehouses, and subsequently register the nominated tables/files.
NOTE: For Object Store, File Source, or RESTful API connections, you will need to create a LIGHTNING DATABASE first.
- CREATE DATALAKE
- CREATE DATASOURCE
- CREATE LIGHTNING DATABASE (Filesource, Object Store, and RESTful API connections)
- CREATE TABLE
- LAKEHOUSE
- CREATE LIGHTNING FILESTORE TABLE
- CREATE LIGHTNING REST TABLE
- DROP DATASOURCE
- DROP LIGHTNING DATABASE
- DROP DATALAKE
- REGISTER DATASOURCE TABLES
- REGISTER METASTORE
- REGISTER STREAMING DATASOURCE
CREATE DATALAKE
Create initial Datalake connection
for S3:
CREATE DATALAKE [DATALAKE_NAME] OPTIONS (
storageType 'S3',
storagePath 's3a://path/to/lakehouse',
AwsAccessKeyId '<AwsAccessKeyId',
AwsSecretAccessKey '<AwsSecretAccessKey.'
);
for Azure:
CREATE DATALAKE [DATALAKE_NAME] OPTIONS (
storageType 'AzureBlob',
storagePath 'wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory>',
AzureAccountName '<AzureAccountName>',
AzureAccessKey '<AzureAccessKey>
);
for local:
create datalake [DATALAKE_NAME] options (
storageType 'Local',
storagePath '/home/zetaris/path/to/lakehouse'
);
CREATE TABLE (lakehouse)
When creating a table, you need to reference the [DIRECTORY_NAME] which is either a current folder in the path you are currently connected to, or it will create a new directory
Create a table with schema
CREATE TABLE [DATALAKE_NAME].[DIRECTORY_NAME].[YourTableName](
-- put your schema here
-- id bigint NOT NULL COMMENT 'unique id',
-- data string
)
USING iceberg
PARTITIONED BY (
-- put your partition columns here
-- category
)
Create a table using select
CREATE TABLE [DATALAKE_NAME].[DIRECTORY_NAME].[YourtableName]
USING iceberg
AS
-- Add your select query here
-- SELECT * from table
Pipeline and View Statements
Pipeline and View statements are used to create, and manage the virtual pipelines and views that are created in Zetaris..
NOTE: Any Pipeline or View must be stored in a Container
- ALTER VIEW
- CREATE CONTAINER
- CREATE PIPELINE CONTAINER
- CREATE SCHEMASTORE CONTAINER
- CREATE SCHEMASTORE VIEW
- DROP VIEW
- DROP CONTAINER
- DROP PIPELINE CONTAINER
- DROP SCHEMASTORE CONTAINER
Example:
CREATE SCHEMASTORE CONTAINER <viewcontainername>;
CREATE SCHEMASTORE VIEW <viewname> WITH CONTAINER <viewcontainername> AS (…);
drop view <container_name>.<view_name>
DDL Statements (connected databases and lakehouses)
Data Definition statements (Data Definition Language, DDL) are used to create or modify the structure of database objects in a database or in Zetaris. Zetaris supports for followings Data Definition Statements
- ALTER TABLE
- CREATE TABLE
- CREATE TABLE (Database)
- CREATE TABLE (Lakehouse)
- TRUNCATE TABLE
DML Statements (connected databases and lakehouses)
Data Manipulation statements (Data Manipulation Language, DML) are used to add, change, or delete data. Spark SQL supports the following Data Manipulation Statements:
INSERT TABLE
The INSERT statement inserts new rows into a table or overwrites the existing data in the database table, lakehouse table or file. The inserted rows can be specified by value expressions or result from a query.
table_identifier, with emphasis on the namespace convention.
INSERT INTO <datasource_name>.<table_name> ....
- Spark Insert Statements
- Presto Insert Statements
Data Retrieval Statements
Data retrieval statements are commands used to extract specific data from databases, primarily using Structured Query Language (SQL). The most fundamental and widely used data retrieval statement is the SELECT statement, which allows users to query databases to fetch data based on defined criteria.
table_identifier, with emphasis on the namespace convention.
SELECT column1, column2 FROM <datasource_name>.<table_name>;
- Spark Data Retrieval Statements
- Presto Data Retrieval Statements
DCL Statements
Data Control statements (Data Control Language, DCL) allow you to, through SQL, manage security elements across the Zetaris platform.
- GRANT
- REVOKE
- UPSERT AUTH (for RESTful API connections)
USER
ADD USER
ADD USER WITH (
email 'someone@zetaris.com',
name 'someone',
level 'general',
password '1234567'
)
ROLE
CREATE ROLE
CREATE ROLE role_name [DESCRIBE BY "this is blah~~~"]
ASSIGN ROLE a Role
ASSIGN USER user_name [, user_name] ...TO ROLE role_name
GRANT
Granted user with GRANT OPTIONS can grant same privilege on the table.
Privileges
SELECT privilege - Give read access to the data source or relation
INSERT privilege - Give insert access to the data source or relation
CACHE privileges - Give cache access to a relation
GRANT SELECT | INSERT | CACHE ON table_or_view_name TO principal_spec [,principal_spec] ... [WITH GRANT OPTION]
REVOKE
REVOKE SELECT | INSERT | CACHE ON table_or_view_name FROM principal_spec [,principal_spec] ...
UPSERT_AUTH
Example
UPSERT AUTH salesforce_auth REQUEST(
endpoint "https://login.salesforce.com/services/oauth2/token",
method "post",
http_encoding "urlencoded"
) HEADER (
) BODY (
grant_type "password",
username "ben.tzannes@zetaris.com",
password "gac7XTK1tvg*vmx@pazMfSMCBJSt16VqRSGamFghtca5",
client_id "3MVG9fe4g9fhX0E4_R1XeMebamQmo1snRTSKhSmyJhPzfl5sexvO3gUSwg_F5D6wvyxjaKpViYxuqkP7.3Ntr",
client_secret "0B07E7FA2AB52A88C783BEA76579BBA72757A114662866B0771672770FF10FB9"
);
Auxiliary Statements
-
-
- ANALYZE
- DESCRIBE
- DESCRIBE MASTER DATASOURCE
- DESCRIBE TABLES
- SHOW
- SHOW CACHE TABLES
- SHOW DATASOURCE TABLES
- OPTIMIZER
-
ANALYZE
OPTIMIZER ON/OFF
To turn the Zetaris Optimization on and off, execute the following command in the SQL Editor
TURN OPTIMIZER "PUSHDOWN_AGGREGATION" ON | OFF datasource