For Admins
This guide covers the initial setup required to use Databricks Labs in Vocareum, supported by resources from your own Databricks E2 and AWS accounts.
Alternative Approaches
You may also choose to connect an Azure account using the Cloud Labs: Bring Your Own Azure Account guide, then use the 1st-party Databricks within Azure Cloud Labs. In this case, Vocareum will help manage usage of your Azure account, but will not manage Databricks directly.
For higher education institutions, the Databricks University Alliance may be another option to support Databricks training within your courses. If you are interested in working with them, you can reach out here: Databricks Help Center | Contact Us.
Prerequisites
Before beginning this process, you will need to have an AWS account connected to Vocareum. Please take a look at the Cloud Labs: Bring Your Own AWS Account guide for detailed instructions, and email support@vocareum.com if you have any questions.
Configure your Databricks account
In your Databricks E2 account:
Locate your E2 account ID and note it down
Create a service principal with an account admin role
Note down the service principal's UUID, a.k.a. Application ID
Generate a secret for that service principal
730 days lifetime
Note down the secret
Set the authentication method to SSO using the following information:
SAML 2.0
Single Sign-On URL: https://labs.vocareum.com/idp/databricks.php
Entity ID: https://labs.vocareum.com/idp/metadata.php
x.509 Certificate:
-----BEGIN CERTIFICATE----- MIIDiTCCAnGgAwIBAgIBADANBgkqhkiG9w0BAQsFADBfMQswCQYDVQQGEwJVUzET MBEGA1UECAwKQ2FsaWZvcm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoM CFZvY2FyZXVtMRUwEwYDVQQDDAx2b2NhcmV1bS5jb20wHhcNMjMwMTI0MjI1MDU3 WhcNMjUwMTIzMjI1MDU3WjBfMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZv cm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoMCFZvY2FyZXVtMRUwEwYD VQQDDAx2b2NhcmV1bS5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB AQDrKf0u2WbQ+R4utxEj0hD7Stgj6SGq207kCHI+XtIThgFZTMGyVoGyeDlgTNgZ wG/+Qm45R7GOeRIq8gC1B4R6WidFg0xEURYE6kkqQ6CFHhqKIb144RQQyN3jfc3n g8CxzZrS2j5BRTKy2oiYP16xXiWMjg5qL6gXDchM/VjN6+kgXf54WGc9TT98vQWC yd8H2UaM43hujOlrprtr5PsQZhc9uDevcbj1YgIK+W4ox0QqNbUJPJgSrzFkukVy rjZKSwLLCn6FtzCu3AfYmk0/+NqdqRsPQNuReiMVkuyVO5A+jPNjxchldDg/LQkF KX/3lmLSybtNwfJdPtK6UKH9AgMBAAGjUDBOMB0GA1UdDgQWBBRPWM+O1uAzlxEN /vRXZ4gTqnKRyTAfBgNVHSMEGDAWgBRPWM+O1uAzlxEN/vRXZ4gTqnKRyTAMBgNV HRMEBTADAQH/MA0GCSqGSIb3DQEBCwUAA4IBAQDjRVBbPyTTCkQo8MVdEnL4Ou3w tfnzFhWl69O6AEUyF7RKab0FE9kCPpwh/2/6lMG6dvtnFJDfeUIEluz2mho7UqGz pDH72/6TDTootYvs01wSBMXof7F7ZFJ+lul7lA+4sjSrr6GcB6StaD3qENY7rG32 8Ty16bvUZLq11kvM+6NbqQdpe9dg+9N0Ju9krg63zoox4cQDe4JRd/dH7/yZr5DO xcXrN7zR2QZ4duNOk/EZMNg6gLOBQ5Y+j2QcuWTZ3XtUO5j2wW6/C/AGSRhdhnon wmj4ZDdUr3mTZvf03+77hAbyoIdjsdyhjiYyLth1FIP+ITnPGQZBykKIWeyz -----END CERTIFICATE-----
Configure your AWS account
In the AWS account you previously connected to Vocareum
VocareumVM Role
Create the vocareumvm IAM role and policy to allow Vocareum to provision the underlying network infrastructure in AWS in advance of launching your Databricks labs.
Name the policy "vocareumvm-policy-databricks" and use the following JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VocDbEc2Access",
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteInternetGateway",
"ec2:DeleteNatGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteVpc",
"ec2:DescribeAccountAttributes",
"ec2:DescribeAddresses",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeCustomerGateways",
"ec2:DescribeDhcpOptions",
"ec2:DescribeEgressOnlyInternetGateways",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRegions",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcEndpoints",
"ec2:DescribeVpcEndpointServiceConfigurations",
"ec2:DescribeVpcPeeringConnections",
"ec2:DescribeVpcs",
"ec2:DescribeVpnConnections",
"ec2:DescribeVpnGateways",
"ec2:DetachInternetGateway",
"ec2:DisassociateRouteTable",
"ec2:ModifySubnetAttribute",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress"
],
"Resource": "*"
},
{
"Sid": "VocDbCfnAccess",
"Effect": "Allow",
"Action": [
"cloudformation:CreateStack",
"cloudformation:DeleteStack",
"cloudformation:DescribeStackEvents",
"cloudformation:DescribeStacks",
"cloudformation:GetStackPolicy",
"cloudformation:ListStacks",
"cloudformation:UpdateTerminationProtection"
],
"Resource": "*"
}
]
}Create a role named "vocareumvm"
Attach the "vocareumvm-db-policy" policy to the "vocareumvm" role.
VocareumBD Role
Create the vocareumdb IAM role and policy to enable Vocareum to create workspaces and clusters in Databricks.
Create a policy named "vocareumdb-policy" and apply the following JSON.
β{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1403287045000",
"Effect": "Allow",
"Action": [
"ec2:AssociateIamInstanceProfile",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribePrefixLists",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcs",
"ec2:DetachVolume",
"ec2:DisassociateIamInstanceProfile",
"ec2:ReplaceIamInstanceProfileAssociation",
"ec2:RequestSpotInstances",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances",
"ec2:DescribeFleetHistory",
"ec2:ModifyFleet",
"ec2:DeleteFleets",
"ec2:DescribeFleetInstances",
"ec2:DescribeFleets",
"ec2:CreateFleet",
"ec2:DeleteLaunchTemplate",
"ec2:GetLaunchTemplateData",
"ec2:CreateLaunchTemplate",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:ModifyLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:CreateLaunchTemplateVersion",
"ec2:AssignPrivateIpAddresses",
"ec2:GetSpotPlacementScores"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"iam:CreateServiceLinkedRole",
"iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
}
}
]
}Create a role named "vocareumdb" using this custom trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::414351767826:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:ExternalId": "42b65603-f19a-4c55-9b5b-034cc37f6652"
}
}
}
]
}Attach the "vocareumdb-policy" to the "vocareumdb" role.
β.
S3 Bucket for Databricks VMs
Create an S3 bucket named "vocareum-db-bucket"
Make sure the region of the bucket matches the region on the Vocareum organization settings under Edit Org > Custom Infra. > VM Settings
After creating the bucket, modify the bucket permissions. Edit the bucket policy using the following JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Grant Databricks Access",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::414351767826:root"
},
"Action": [
"s3:GetObject",
"s3:GetObjectVersion",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::vocareum-db-bucket/*",
"arn:aws:s3:::vocareum-db-bucket"
],
"Condition": {
"StringEquals": {
"aws:PrincipalTag/DatabricksAccountId": "42b65603-f19a-4c55-9b5b-034cc37f6652"
}
}
}
]
}
β
Contact Vocareum Support
Send an email to support@vocareum.com to request a Databricks integration.
Be ready to provide your:
Databricks E2 account ID
Databricks Application ID
Databricks service principal secret
ARN of the vocareumdb role