impact
ideas
- todo: yml in databricks
- [WIP] my plan for impact in 2024 - 2025
challanges
- worked on this table in Excel
learnings
- new values IMPACT - INCLUSION MASTERY PURPOSE ACTION CURIOSITY TEAMWORK
- databricks
- how to get info about server hostname and http_path for account
- email_account (right corner) > User Settings> Configuration > Advanced options > JDBC/ODBC
- for connect to databricks remotly for example sqlalchemy + alembic
- data skipping
- clone table to other place
- import one databricks notebook into another
- how to get info about server hostname and http_path for account
%run /Shared/MyNotebook
# or relative path:
%run ./MyNotebook
achievements
-
leave as last person the office
-
databricks: migration clone table from hive to unity catalog
from dataclasses import dataclass
from delta import DeltaTable
from pyspark.sql import SparkSession
from typing import Optional
session = SparkSession.builder.appName("data_migration").getOrCreate()
@dataclass
class DataPath:
catalog: str
database: str
table: str
source: Optional[str] = None
def __post_init__(self):
if self.source is None:
self.source = f'{self.catalog}.{self.database}.{self.table}'
source_path = DataPath('catalog1', 'database1', 'table1')
result_path = DataPath('catalog2', 'database2', 'table2')
print(source_path.source)
print(result_path.source)
# deltaTable = DeltaTable.forPath(session, "/path/to/table") # path-based tables, or
deltaTable = DeltaTable.forName(session, source_path.full_path) # Hive metastore-based tables
deltaTable.clone(target=result_path.full_path, isShallow=False, replace=False)
Thanks for reading this ❤️
Love,
KK