impact

ideas

todo: yml in databricks
[WIP] my plan for impact in 2024 - 2025

challanges

worked on this table in Excel

learnings

new values IMPACT - INCLUSION MASTERY PURPOSE ACTION CURIOSITY TEAMWORK
databricks
- how to get info about server hostname and http_path for account
  - email_account (right corner) > User Settings> Configuration > Advanced options > JDBC/ODBC
  - for connect to databricks remotly for example sqlalchemy + alembic
- data skipping
  - https://docs.databricks.com/en/delta/data-skipping.html
- clone table to other place
  - https://docs.databricks.com/en/delta/clone.html#language-python
- import one databricks notebook into another

  %run /Shared/MyNotebook
  # or relative path:
  %run ./MyNotebook

achievements

leave as last person the office
databricks: migration clone table from hive to unity catalog

from dataclasses import dataclass
from delta import DeltaTable

from pyspark.sql import SparkSession

from typing import Optional

session = SparkSession.builder.appName("data_migration").getOrCreate()

@dataclass
class DataPath:

    catalog: str
    database: str
    table: str
    source: Optional[str] = None

    def __post_init__(self):
        if self.source is None:
            self.source = f'{self.catalog}.{self.database}.{self.table}'

source_path = DataPath('catalog1', 'database1', 'table1')
result_path = DataPath('catalog2', 'database2', 'table2')


print(source_path.source)
print(result_path.source)

# deltaTable = DeltaTable.forPath(session, "/path/to/table")  # path-based tables, or
deltaTable = DeltaTable.forName(session, source_path.full_path)    # Hive metastore-based tables
deltaTable.clone(target=result_path.full_path, isShallow=False, replace=False)

Thanks for reading this ❤️

Love,

Valentines with brother

healty