Skip to main content

Setting Up Databricks Resources

Connect your Databricks E2 account to Vocareum and provision supporting AWS cloud resources

M
Written by Mary Gordanier
Updated this week

For Admins

This guide covers the initial setup required to use Databricks Labs in Vocareum, supported by resources from your own Databricks E2 and AWS accounts.

Alternative Approaches

You may also choose to connect an Azure account using the Cloud Labs: Bring Your Own Azure Account guide, then use the 1st-party Databricks within Azure Cloud Labs. In this case, Vocareum will help manage usage of your Azure account, but will not manage Databricks directly.

For higher education institutions, the Databricks University Alliance may be another option to support Databricks training within your courses. If you are interested in working with them, you can reach out here: Databricks Help Center | Contact Us.

Prerequisites

Before beginning this process, you will need to have an AWS account connected to Vocareum. Please take a look at the Cloud Labs: Bring Your Own AWS Account guide for detailed instructions, and email support@vocareum.com if you have any questions.

Configure your Databricks account

In your Databricks E2 account:

  1. Locate your E2 account ID and note it down

  2. Create a service principal with an account admin role

    1. Note down the service principal's UUID, a.k.a. Application ID

    2. Generate a secret for that service principal

      1. 730 days lifetime

      2. Note down the secret

  3. Set the authentication method to SSO using the following information:

    1. SAML 2.0

    2. x.509 Certificate:

      -----BEGIN CERTIFICATE----- MIIDiTCCAnGgAwIBAgIBADANBgkqhkiG9w0BAQsFADBfMQswCQYDVQQGEwJVUzET MBEGA1UECAwKQ2FsaWZvcm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoM CFZvY2FyZXVtMRUwEwYDVQQDDAx2b2NhcmV1bS5jb20wHhcNMjMwMTI0MjI1MDU3 WhcNMjUwMTIzMjI1MDU3WjBfMQswCQYDVQQGEwJVUzETMBEGA1UECAwKQ2FsaWZv cm5pYTERMA8GA1UEBwwIU2FuIEpvc2UxETAPBgNVBAoMCFZvY2FyZXVtMRUwEwYD VQQDDAx2b2NhcmV1bS5jb20wggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIB AQDrKf0u2WbQ+R4utxEj0hD7Stgj6SGq207kCHI+XtIThgFZTMGyVoGyeDlgTNgZ wG/+Qm45R7GOeRIq8gC1B4R6WidFg0xEURYE6kkqQ6CFHhqKIb144RQQyN3jfc3n g8CxzZrS2j5BRTKy2oiYP16xXiWMjg5qL6gXDchM/VjN6+kgXf54WGc9TT98vQWC yd8H2UaM43hujOlrprtr5PsQZhc9uDevcbj1YgIK+W4ox0QqNbUJPJgSrzFkukVy rjZKSwLLCn6FtzCu3AfYmk0/+NqdqRsPQNuReiMVkuyVO5A+jPNjxchldDg/LQkF KX/3lmLSybtNwfJdPtK6UKH9AgMBAAGjUDBOMB0GA1UdDgQWBBRPWM+O1uAzlxEN /vRXZ4gTqnKRyTAfBgNVHSMEGDAWgBRPWM+O1uAzlxEN/vRXZ4gTqnKRyTAMBgNV HRMEBTADAQH/MA0GCSqGSIb3DQEBCwUAA4IBAQDjRVBbPyTTCkQo8MVdEnL4Ou3w tfnzFhWl69O6AEUyF7RKab0FE9kCPpwh/2/6lMG6dvtnFJDfeUIEluz2mho7UqGz pDH72/6TDTootYvs01wSBMXof7F7ZFJ+lul7lA+4sjSrr6GcB6StaD3qENY7rG32 8Ty16bvUZLq11kvM+6NbqQdpe9dg+9N0Ju9krg63zoox4cQDe4JRd/dH7/yZr5DO xcXrN7zR2QZ4duNOk/EZMNg6gLOBQ5Y+j2QcuWTZ3XtUO5j2wW6/C/AGSRhdhnon wmj4ZDdUr3mTZvf03+77hAbyoIdjsdyhjiYyLth1FIP+ITnPGQZBykKIWeyz -----END CERTIFICATE-----

Configure your AWS account

In the AWS account you previously connected to Vocareum

VocareumVM Role

Create the vocareumvm IAM role and policy to allow Vocareum to provision the underlying network infrastructure in AWS in advance of launching your Databricks labs.

  1. Name the policy "vocareumvm-policy-databricks" and use the following JSON:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "VocDbEc2Access",
    "Effect": "Allow",
    "Action": [
    "ec2:AllocateAddress",
    "ec2:AssociateRouteTable",
    "ec2:AttachInternetGateway",
    "ec2:AuthorizeSecurityGroupEgress",
    "ec2:AuthorizeSecurityGroupIngress",
    "ec2:CreateInternetGateway",
    "ec2:CreateNatGateway",
    "ec2:CreateRoute",
    "ec2:CreateRouteTable",
    "ec2:CreateSecurityGroup",
    "ec2:CreateSubnet",
    "ec2:CreateTags",
    "ec2:CreateVpc",
    "ec2:CreateVpcEndpoint",
    "ec2:DeleteInternetGateway",
    "ec2:DeleteNatGateway",
    "ec2:DeleteRoute",
    "ec2:DeleteRouteTable",
    "ec2:DeleteSecurityGroup",
    "ec2:DeleteSubnet",
    "ec2:DeleteVpc",
    "ec2:DescribeAccountAttributes",
    "ec2:DescribeAddresses",
    "ec2:DescribeAvailabilityZones",
    "ec2:DescribeCustomerGateways",
    "ec2:DescribeDhcpOptions",
    "ec2:DescribeEgressOnlyInternetGateways",
    "ec2:DescribeInstances",
    "ec2:DescribeInternetGateways",
    "ec2:DescribeNatGateways",
    "ec2:DescribeNetworkAcls",
    "ec2:DescribeNetworkInterfaces",
    "ec2:DescribeRegions",
    "ec2:DescribeRouteTables",
    "ec2:DescribeSecurityGroups",
    "ec2:DescribeSubnets",
    "ec2:DescribeTags",
    "ec2:DescribeVpcAttribute",
    "ec2:DescribeVpcEndpoints",
    "ec2:DescribeVpcEndpointServiceConfigurations",
    "ec2:DescribeVpcPeeringConnections",
    "ec2:DescribeVpcs",
    "ec2:DescribeVpnConnections",
    "ec2:DescribeVpnGateways",
    "ec2:DetachInternetGateway",
    "ec2:DisassociateRouteTable",
    "ec2:ModifySubnetAttribute",
    "ec2:ModifyVpcAttribute",
    "ec2:ReleaseAddress",
    "ec2:RevokeSecurityGroupEgress",
    "ec2:RevokeSecurityGroupIngress"
    ],
    "Resource": "*"
    },
    {
    "Sid": "VocDbCfnAccess",
    "Effect": "Allow",
    "Action": [
    "cloudformation:CreateStack",
    "cloudformation:DeleteStack",
    "cloudformation:DescribeStackEvents",
    "cloudformation:DescribeStacks",
    "cloudformation:GetStackPolicy",
    "cloudformation:ListStacks",
    "cloudformation:UpdateTerminationProtection"
    ],
    "Resource": "*"
    }
    ]
    }

  2. Create a role named "vocareumvm"

  3. Attach the "vocareumvm-db-policy" policy to the "vocareumvm" role.

VocareumBD Role

Create the vocareumdb IAM role and policy to enable Vocareum to create workspaces and clusters in Databricks.

  1. Create a policy named "vocareumdb-policy" and apply the following JSON.
    ​

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "Stmt1403287045000",
    "Effect": "Allow",
    "Action": [
    "ec2:AssociateIamInstanceProfile",
    "ec2:AttachVolume",
    "ec2:AuthorizeSecurityGroupEgress",
    "ec2:AuthorizeSecurityGroupIngress",
    "ec2:CancelSpotInstanceRequests",
    "ec2:CreateTags",
    "ec2:CreateVolume",
    "ec2:DeleteTags",
    "ec2:DeleteVolume",
    "ec2:DescribeAvailabilityZones",
    "ec2:DescribeIamInstanceProfileAssociations",
    "ec2:DescribeInstanceStatus",
    "ec2:DescribeInstances",
    "ec2:DescribeInternetGateways",
    "ec2:DescribeNatGateways",
    "ec2:DescribeNetworkAcls",
    "ec2:DescribePrefixLists",
    "ec2:DescribeReservedInstancesOfferings",
    "ec2:DescribeRouteTables",
    "ec2:DescribeSecurityGroups",
    "ec2:DescribeSpotInstanceRequests",
    "ec2:DescribeSpotPriceHistory",
    "ec2:DescribeSubnets",
    "ec2:DescribeVolumes",
    "ec2:DescribeVpcAttribute",
    "ec2:DescribeVpcs",
    "ec2:DetachVolume",
    "ec2:DisassociateIamInstanceProfile",
    "ec2:ReplaceIamInstanceProfileAssociation",
    "ec2:RequestSpotInstances",
    "ec2:RevokeSecurityGroupEgress",
    "ec2:RevokeSecurityGroupIngress",
    "ec2:RunInstances",
    "ec2:TerminateInstances",
    "ec2:DescribeFleetHistory",
    "ec2:ModifyFleet",
    "ec2:DeleteFleets",
    "ec2:DescribeFleetInstances",
    "ec2:DescribeFleets",
    "ec2:CreateFleet",
    "ec2:DeleteLaunchTemplate",
    "ec2:GetLaunchTemplateData",
    "ec2:CreateLaunchTemplate",
    "ec2:DescribeLaunchTemplates",
    "ec2:DescribeLaunchTemplateVersions",
    "ec2:ModifyLaunchTemplate",
    "ec2:DeleteLaunchTemplateVersions",
    "ec2:CreateLaunchTemplateVersion",
    "ec2:AssignPrivateIpAddresses",
    "ec2:GetSpotPlacementScores"
    ],
    "Resource": [
    "*"
    ]
    },
    {
    "Effect": "Allow",
    "Action": [
    "iam:CreateServiceLinkedRole",
    "iam:PutRolePolicy"
    ],
    "Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
    "Condition": {
    "StringLike": {
    "iam:AWSServiceName": "spot.amazonaws.com"
    }
    }
    }
    ]
    }

  2. Create a role named "vocareumdb" using this custom trust policy:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "42b65603-f19a-4c55-9b5b-034cc37f6652"
    }
    }
    }
    ]
    }

  3. Attach the "vocareumdb-policy" to the "vocareumdb" role.
    ​.

S3 Bucket for Databricks VMs

  1. Create an S3 bucket named "vocareum-db-bucket"

    1. Make sure the region of the bucket matches the region on the Vocareum organization settings under Edit Org > Custom Infra. > VM Settings

  2. After creating the bucket, modify the bucket permissions. Edit the bucket policy using the following JSON:

    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "Grant Databricks Access",
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": [
    "s3:GetObject",
    "s3:GetObjectVersion",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation"
    ],
    "Resource": [
    "arn:aws:s3:::vocareum-db-bucket/*",
    "arn:aws:s3:::vocareum-db-bucket"
    ],
    "Condition": {
    "StringEquals": {
    "aws:PrincipalTag/DatabricksAccountId": "42b65603-f19a-4c55-9b5b-034cc37f6652"
    }
    }
    }
    ]
    }


    ​

Contact Vocareum Support

Send an email to support@vocareum.com to request a Databricks integration.

Be ready to provide your:

  1. Databricks E2 account ID

  2. Databricks Application ID

  3. Databricks service principal secret

  4. ARN of the vocareumdb role

Did this answer your question?