For Admins
This guide covers the initial setup required to use Databricks Labs in Vocareum, supported by resources from your own Databricks E2 and AWS accounts.
Alternative Approaches
You may also choose to connect an Azure account using the Cloud Labs: Bring Your Own Azure Account guide, then use the 1st-party Databricks within Azure Cloud Labs. In this case, Vocareum will help manage usage of your Azure account, but will not manage Databricks directly.
For higher education institutions, the Databricks University Alliance may be another option to support Databricks training within your courses. If you are interested in working with them, you can reach out here: Databricks Help Center | Contact Us.
Prerequisites/Considerations
Databricks SSO setup: Vocareum has to be the identity provider
this means any UI access to Databricks has to be done through Vocareum
Databricks resource limits:
maximum 3 active workspaces for standard
maximum 10 active workspaces for premium
maximum 50 active workspaces for enterprise
Configure your AWS account
2 IAM roles and 1 S3 bucket need to be created.
If you will be using a metastore with metastore-level managed storage in AWS (see https://docs.databricks.com/aws/en/data-governance/unity-catalog/create-metastore and https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started#metastore-storage), then a total of 2 S3 buckets will need to be created.
IAM Role: vocareumvm
The first role needs to be named vocareumvm. Use the following for the permissions policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VocDbEc2Access",
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteVpc",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeCustomerGateways",
"ec2:DescribeDhcpOptions",
"ec2:DescribeEgressOnlyInternetGateways",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcEndpointServiceConfigurations",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeVpcs",
"ec2:DescribeVpnConnections",
"ec2:DescribeVpnGateways",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress"
],
"Resource": "*"
},
{
"Sid": "VocDbCfnAccess",
"Effect": "Allow",
"Action": [
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStacks",
"cloudformation:GetStackPolicy",
"cloudformation:ListStacks",
"cloudformation:UpdateTerminationProtection"
],
"Resource": "*"
}
]
}
Use the following for the trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::{{your AWS account ID}}:role/vocareumvm",
"arn:aws:iam::{{our AWS account ID}}:root",
]
},
"Action": "sts:AssumeRole"
},
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
We will let you know what "our AWS account ID" is
IAM Role: vocareum-db
The second role can be named anything, but vocareum-db will work. Use the following for the permissions policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1403287045000",
"Effect": "Allow",
"Action": [
"ec2:AssociateIamInstanceProfile",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribePrefixLists",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcs",
"ec2:DetachVolume",
"ec2:DisassociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation",
"ec2:RequestSpotInstances",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:DescribeFleetHistory",
"ec2:ModifyFleet",
"ec2:DeleteFleets",
"ec2:DescribeFleetInstances",
"ec2:DescribeFleets",
"ec2:CreateFleet",
"ec2:DeleteLaunchTemplate",
"ec2:GetLaunchTemplateData",
"ec2:CreateLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:ModifyLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:CreateLaunchTemplateVersion",
"ec2:AssignPrivateIpAddresses",
"ec2:GetSpotPlacementScores"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
}
}
]
}
Use the following for the trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::414351767826:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "{{your Databricks E2 account ID}}"
}
}
}
]
}
S3 Bucket: vocareum-db-bucket
The S3 bucket can be named anything, but vocareum-db-bucket will work. Use the following for the bucket permissions policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Grant Databricks Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::414351767826:root"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::vocareum-db-bucket/*",
"arn:aws:s3:::vocareum-db-bucket"
],
"Condition": {
"StringEquals": {
"aws:PrincipalTag/DatabricksAccountId": "{{your Databricks E2 account ID}}"
}
}
}
]
}
Configure your Databricks account
SSO
Configure the authentication with the following values:
Identity protocol: SAML 2.0
Single Sign-On URL: https://labs.vocareum.com/idp/databricks.php
Entity ID: https://labs.vocareum.com/idp/metadata.php
x.509 Certificate:
-----BEGIN CERTIFICATE----- MIIDiTCCAnGgAwIBAgIBADANBgkqhkiG9w0BAQsFADBfMQswCQYDVQQGEwJVUzET MBEGA1UECAwKQ2FsaWZvcm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoM CFZvY2FyZXVtMRUwEwYDVQQDDAx2b2NhcmV1bS5jb20wHhcNMjMwMTI0MjI1MDU3 WhcNMjUwMTIzMjI1MDU3WjBfMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZv cm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoMCFZvY2FyZXVtMRUwEwYD VQQDDAx2b2NhcmV1bS5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB AQDrKf0u2WbQ+R4utxEj0hD7Stgj6SGq207kCHI+XtIThgFZTMGyVoGyeDlgTNgZ wG/+Qm45R7GOeRIq8gC1B4R6WidFg0xEURYE6kkqQ6CFHhqKIb144RQQyN3jfc3n g8CxzZrS2j5BRTKy2oiYP16xXiWMjg5qL6gXDchM/VjN6+kgXf54WGc9TT98vQWC yd8H2UaM43hujOlrprtr5PsQZhc9uDevcbj1YgIK+W4ox0QqNbUJPJgSrzFkukVy rjZKSwLLCn6FtzCu3AfYmk0/+NqdqRsPQNuReiMVkuyVO5A+jPNjxchldDg/LQkF KX/3lmLSybtNwfJdPtK6UKH9AgMBAAGjUDBOMB0GA1UdDgQWBBRPWM+O1uAzlxEN /vRXZ4gTqnKRyTAfBgNVHSMEGDAWgBRPWM+O1uAzlxEN/vRXZ4gTqnKRyTAMBgNV HRMEBTADAQH/MA0GCSqGSIb3DQEBCwUAA4IBAQDjRVBbPyTTCkQo8MVdEnL4Ou3w tfnzFhWl69O6AEUyF7RKab0FE9kCPpwh/2/6lMG6dvtnFJDfeUIEluz2mho7UqGz pDH72/6TDTootYvs01wSBMXof7F7ZFJ+lul7lA+4sjSrr6GcB6StaD3qENY7rG32 8Ty16bvUZLq11kvM+6NbqQdpe9dg+9N0Ju9krg63zoox4cQDe4JRd/dH7/yZr5DO xcXrN7zR2QZ4duNOk/EZMNg6gLOBQ5Y+j2QcuWTZ3XtUO5j2wW6/C/AGSRhdhnon wmj4ZDdUr3mTZvf03+77hAbyoIdjsdyhjiYyLth1FIP+ITnPGQZBykKIWeyz -----END CERTIFICATE-----
Service Principal
1 service principal needs to be created.
Assign the "Account admin" role to it
Generate an OAuth secret for that service principal
note down the secret
Metastore
If you will be using a metastore, then that can also be created at this time. (See https://docs.databricks.com/aws/en/data-governance/unity-catalog/create-metastore)
Information We Need
Since there is some sensitive information, contact Vocareum to ask about who to send the information to.
AWS role ARN for the vocareum-db role
AWS S3 bucket name
Databricks E2 account ID
Databricks service principal's secret
Databricks service principal's secret creation date and lifetime (days)
Databricks service principal's ID
Databricks service principal's UUID, a.k.a., Application ID