项目作者: kennydataml

项目描述 :
Framework for end to end databricks platform administration using YAML. Includes cluster management, ACL permissions, and wrapper for SCIM & Permissions REST API.
高级语言: Python
项目地址: git://github.com/kennydataml/databricks-platform-administration-framework.git
创建时间: 2021-02-11T15:50:50Z
项目社区:https://github.com/kennydataml/databricks-platform-administration-framework

开源协议:GNU General Public License v3.0

下载


Databricks DevOps: Platform Administration Framework

Framework for end to end platform administration using YAML configuration files. This includes cluster management and ACL permissions.
The Python code is a wrapper for SCIM and Permissions REST API.

Tech Debt

  • SCIM API is still in-preview. Should switch to AAD integration once it is GA.
  • Permissions API is still in-preview. Update the end point in api.py once it is GA.

    High-Level End to End Deployment

  1. Create Workspace
  2. Create Secret Scopes for workspace (manual)
    • programatically needs AAD token which requires AAD application.
    • also needs KV resource ID, KV DNS name.
  3. create feature branch and configure ACL.yaml per environment
    • requires Service Principal IDs, Secret Scope Names
  4. commit and PR to deployment branch (develop is default)
    • azure pipeline yaml will run cluster.py, then acl.py in that order
    • if pipeline doesn’t exist, Set up build from develop branch using azure pipeline yaml

      Python sys.path

      NOTE : remember to add the databricks folder to python’s sys.path
  • local development: conda develop .
  • azure pipelines: there are 2 options
    1. # option 1 (recommended)
    2. variables:
    3. - name: PYTHONPATH
    4. value: "%PYTHONPATH%;$(extra_path)"
    5. # option 2
    6. - script: python script.py
    7. env:
    8. PYTHONPATH : "%PYTHONPATH%;$(extra_path)"
  1. .
  2. README.md
  3. requirements.txt # required python libs
  4. __init__.py
  5. ├───databricks_api
  6. acl.py # ACL main script. uses ACL*.yaml
  7. api.py # custom API classes for SCIM and Permissions API
  8. base.py # base super classes
  9. cluster.py # cluster management main script. uses clusterconf*.yaml and clusterlib*.yaml
  10. utils.py # common utilities
  11. __init__.py
  12. └───configuration # can duplicate as necessary
  13. ACL.yaml # ACL configuration
  14. clusterconf.yaml # cluster configuration
  15. clusterlib.yaml # cluster library configuration for all clusters in workspace
  16. └───test # pytest
  17. ACL_expected.yaml
  18. ACL_template.yaml
  19. test_utils.py
  20. __init__.py

cluster.py usage

  1. python databricks_api\cluster.py -h
  2. usage: cluster.py [-h] -pat PERSONAL_ACCESS_TOKEN -wu WORKSPACE_URL
  3. [-ccf CLUSTER_CONFIG_FILE] [-clf CLUSTER_LIBRARY_FILE]
  4. Databricks Workspace ACL Configuration
  5. optional arguments:
  6. -h, --help show this help message and exit
  7. -pat PERSONAL_ACCESS_TOKEN, --personal_access_token PERSONAL_ACCESS_TOKEN
  8. Personal Access Token from Admin Console
  9. -wu WORKSPACE_URL, --workspace_url WORKSPACE_URL
  10. Workspace URL
  11. -ccf CLUSTER_CONFIG_FILE, --cluster_config_file CLUSTER_CONFIG_FILE
  12. Default is clusterconf.yaml
  13. -clf CLUSTER_LIBRARY_FILE, --cluster_library_file CLUSTER_LIBRARY_FILE
  14. Default is clusterlib.yaml

clusterconf.yaml

for ML clusters, specify similar to spark_version: 8.1.x-cpu-ml-scala2.12.
for GPU ML: spark_version: 8.1.x-gpu-ml-scala2.12. Note that GPU spark version does not support credential passthrough.

acl.py usage

  1. python databricks_api\acl.py -h
  2. usage: acl.py [-h] -pat PERSONAL_ACCESS_TOKEN -wu WORKSPACE_URL [-af ACL_FILE]
  3. Databricks Workspace ACL Configuration
  4. optional arguments:
  5. -h, --help show this help message and exit
  6. -pat PERSONAL_ACCESS_TOKEN, --personal_access_token PERSONAL_ACCESS_TOKEN
  7. Personal Access Token from Admin Console
  8. -wu WORKSPACE_URL, --workspace_url WORKSPACE_URL
  9. Workspace URL
  10. -af ACL_FILE, --acl_file ACL_FILE
  11. Default is ACL.yaml

Resources