| name | azure-storage-file-datalake-py |
| description | Azure Data Lake Storage Gen2 SDK for Python. Use for hierarchical file systems, big data analytics, and file/directory operations. |
| risk | unknown |
| source | community |
| date_added | 2026-02-27 |
Azure Data Lake Storage Gen2 SDK for Python
Hierarchical file system for big data analytics workloads.
Installation
pip install azure-storage-file-datalake azure-identity
Environment Variables
AZURE_STORAGE_ACCOUNT_URL=https://<account>.dfs.core.windows.net
Authentication
from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient
credential = DefaultAzureCredential()
account_url = "https://<account>.dfs.core.windows.net"
service_client = DataLakeServiceClient(account_url=account_url, credential=credential)
Client Hierarchy
| Client | Purpose |
|---|
DataLakeServiceClient | Account-level operations |
FileSystemClient | Container (file system) operations |
DataLakeDirectoryClient | Directory operations |
DataLakeFileClient | File operations |
File System Operations
file_system_client = service_client.create_file_system("myfilesystem")
file_system_client = service_client.get_file_system_client("myfilesystem")
service_client.delete_file_system("myfilesystem")
for fs in service_client.list_file_systems():
print(fs.name)
Directory Operations
file_system_client = service_client.get_file_system_client("myfilesystem")
directory_client = file_system_client.create_directory("mydir")
directory_client = file_system_client.create_directory("path/to/nested/dir")
directory_client = file_system_client.get_directory_client("mydir")
directory_client.delete_directory()
directory_client.rename_directory(new_name="myfilesystem/newname")
File Operations
Upload File
file_client = file_system_client.get_file_client("path/to/file.txt")
with open("local-file.txt", "rb") as data:
file_client.upload_data(data, overwrite=True)
file_client.upload_data(b"Hello, Data Lake!", overwrite=True)
file_client.append_data(data=b"chunk1", offset=0, length=6)
file_client.append_data(data=b"chunk2", offset=6, length=6)
file_client.flush_data(12)
Download File
file_client = file_system_client.get_file_client("path/to/file.txt")
download = file_client.download_file()
content = download.readall()
with open("downloaded.txt", "wb") as f:
download = file_client.download_file()
download.readinto(f)
download = file_client.download_file(offset=0, length=100)
Delete File
file_client.delete_file()
List Contents
for path in file_system_client.get_paths():
print(f"{'DIR' if path.is_directory else 'FILE'}: {path.name}")
for path in file_system_client.get_paths(path="mydir"):
print(path.name)
for path in file_system_client.get_paths(path="mydir", recursive=True):
print(path.name)
File/Directory Properties
properties = file_client.get_file_properties()
print(f"Size: {properties.size}")
print(f"Last modified: {properties.last_modified}")
file_client.set_metadata(metadata={"processed": "true"})
Access Control (ACL)
acl = directory_client.get_access_control()
print(f"Owner: {acl['owner']}")
print(f"Permissions: {acl['permissions']}")
directory_client.set_access_control(
owner="user-id",
permissions="rwxr-x---"
)
from azure.storage.filedatalake import AccessControlChangeResult
directory_client.update_access_control_recursive(
acl="user:user-id:rwx"
)
Async Client
from azure.storage.filedatalake.aio import DataLakeServiceClient
from azure.identity.aio import DefaultAzureCredential
async def datalake_operations():
credential = DefaultAzureCredential()
async with DataLakeServiceClient(
account_url="https://<account>.dfs.core.windows.net",
credential=credential
) as service_client:
file_system_client = service_client.get_file_system_client("myfilesystem")
file_client = file_system_client.get_file_client("test.txt")
await file_client.upload_data(b"async content", overwrite=True)
download = await file_client.download_file()
content = await download.readall()
import asyncio
asyncio.run(datalake_operations())
Best Practices
- Use hierarchical namespace for file system semantics
- Use
append_data + flush_data for large file uploads
- Set ACLs at directory level and inherit to children
- Use async client for high-throughput scenarios
- Use
get_paths with recursive=True for full directory listing
- Set metadata for custom file attributes
- Consider Blob API for simple object storage use cases
When to Use
This skill is applicable to execute the workflow or actions described in the overview.
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.