Skip to main content

Athena

Provides access to Amazon Athena service using SQL queries via a JDBC driver.

{
"region": "",
"db": "",
"S3OutputLocation": "",
"VPCEndpoint":"",
"S3OutputEncOption":""
}

Supported Parameters

TypeValueDescription or Example
num, int, double==100
>>1.02
>=>=4.1
<=<=100
!=!=5
inin (1, 3, 10.2)
min-max1.5-10.0
stringcontainscontains ea
starts withstarts with R
ends withends with w
inin (ab, "c d", "e\"f\"")
regexregex ^(.+)@(.+)$
datetimeanytime
beforebefore 1/1/2022
afterafter 1/1/2022
today
this week
this month
this year
last year
min-max
April 2021
list<string> (1)
  • (1) default parameters are not supported

Supported output types

TypeSupported
boolean
tinyint
smallint
int, integer
bigint
double
float
decimal
char, varchar, string
date, timestamp
array<data_type>✅ (1)
map<primitive_type, data_type>✅ (1)
struct<col_name : data_type ...>✅ (1)
binarynot tested
  • (1) supported as a string

Supported features

  • Schema browsing
  • Join DB tables
  • Aggregation query
  • Connection test

Simple steps to link csv files with Athena and get results in Datagrok:

  1. Upload CSVs to an S3 bucket. Note that Athena looks into S3 folder, not file, so if CSVs have different structure, they should be located in separate folders. For example:

     Bucket s3://athena-northwind/
    orders/
    orders.csv
    products/
    products.csv
  2. Create a bucket or folder in the existing bucket for Athena Output. For example:

    s3://athena-northwind/results/
  3. Create table in Athena console. UI builds SQL query for creating table in Athena. Following example for Northwind "products.csv":

    CREATE EXTERNAL TABLE IF NOT EXISTS northwind.products (
    `productid` int,
    `productname` string,
    `supplierid` int,
    `categoryid` int,
    `quantityperunit` string,
    `unitprice` double,
    `unitsinstock` int,
    `unitsonorder` int,
    `reorderlevel` int,
    `discontinued` int
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
    WITH SERDEPROPERTIES (
    'serialization.format' = ',',
    'field.delim' = ','
    ) LOCATION 's3://athena-northwind/products/'
    TBLPROPERTIES (
    'has_encrypted_data' = 'false',
    'skip.header.line.count' = '1'
    );

    Tips:

    • To change CSV delimiter adjust "serialization.format" and "field.delim" parameters
    • To skip the header line add "skip.header.line.count" parameter
  4. Create a data connection in the Datagrok platform. The parameters may include: "region", "vpc endpoint", "db" , "s3OutputLocation", "s3OutputEncOption", "accessKey", "secretKey", or "connString". For example:

    • Name: northwind

    • Region: us-east-2

    • Db: northwind

    • S3 Output Location: s3://athena-northwind/results/

    • Access Key: <key>

    • Secret Key: <secret>

      Notes:

      • VPC Endpoint is optional. If not specified, then canonical endpoint - "athena.[Region].amazonaws.com:443" will be used
      • Do not forget "/" at "S3 Output Location" parameter end
  5. Create a data query under the new connection. For example:

    SELECT * FROM products;

See also: