| name | clickzetta-java-sdk |
| description | Use the ClickZetta Java SDK to write data to Lakehouse tables in batch or in real time.
Covers complete usage patterns for BulkloadStream (local file/database batch uploads)
and RealtimeStream (Kafka real-time consumption and writes), including Maven dependencies,
connection URL formats, row write APIs, status monitoring, Options tuning, and common error handling.
Trigger when users say "Java SDK", "BulkloadStream", "RealtimeStream",
"write to Lakehouse with Java", "Java batch upload", "Kafka Java write",
"clickzetta-java", "Maven dependency", "Java data import",
"Java 写入 Lakehouse", "Java 批量上传", or "Kafka Java 写入".
Keywords: Java SDK, BulkloadStream, RealtimeStream, Kafka consumer, batch write, real-time write
|
ClickZetta Java SDK
The Java SDK provides two write interfaces:
- BulkloadStream - batch writes for scheduled ETL and local file imports. It does not support primary-key tables and is not recommended for high-frequency writes under 5 minutes.
- RealtimeStream - real-time writes for Kafka consumption and streaming ingestion. Data can be queried within seconds.
Read references/bulkload.md for batch writes and references/realtime.md for real-time writes.
Maven Dependency
<dependency>
<groupId>com.clickzetta</groupId>
<artifactId>clickzetta-java</artifactId>
<version>2.0.0</version>
</dependency>
RealtimeStream with Kafka also requires:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>3.2.0</version>
</dependency>
Connection URL Format
ClickZettaClient client = ClickZettaClient.newBuilder()
.service("cn-shanghai-alicloud.api.clickzetta.com")
.instance("your_instance")
.workspace("your_workspace")
.schema("public")
.username("your_user")
.password("your_password")
.vcluster("default")
.build();
String bulkUrl = MessageFormat.format(
"jdbc:clickzetta://{0}.{1}/{2}?schema={3}&username={4}&password={5}&virtualcluster={6}",
instance, region_endpoint, workspace, schema, username, password, vcluster
);
String rtUrl = MessageFormat.format(
"jdbc:clickzetta://{0}.{1}/{2}?schema={3}&username={4}&password={5}&vcluster={6}",
instance, region_endpoint, workspace, schema, username, password, vcluster
);
ClickZettaClient client = ClickZettaClient.newBuilder().url(url).build();
JDBC connection for DDL and queries:
Class.forName("com.clickzetta.client.jdbc.ClickZettaDriver");
Connection conn = DriverManager.getConnection(jdbcUrl);
BulkloadStream Quick Example
BulkloadStream stream = client.newBulkloadStreamBuilder()
.schema("public")
.table("orders")
.operate(RowStream.BulkLoadOperate.APPEND)
.build();
Row row = stream.createRow();
row.setValue(0, "order-001");
row.setValue(1, 1);
row.setValue(2, 299.99);
stream.apply(row);
stream.close();
while (stream.getState() == StreamState.RUNNING) {
Thread.sleep(1000);
}
if (stream.getState() == StreamState.FAILED) {
throw new RuntimeException(stream.getErrorMessage());
}
client.close();
RealtimeStream Quick Example
Options options = Options.builder()
.withMutationBufferLinesNum(10)
.build();
RealtimeStream stream = client.newRealtimeStreamBuilder()
.operate(RowStream.RealTimeOperate.APPEND_ONLY)
.options(options)
.schema("public")
.table("events")
.build();
Row row = stream.createRow(Stream.Operator.INSERT);
row.setValue("id", 1);
row.setValue("event", "{\"type\":\"click\"}");
stream.apply(row);
stream.close();
RealtimeStream CDC Example for Primary-Key Tables
RealtimeStream stream = client.newRealtimeStreamBuilder()
.operate(RowStream.RealTimeOperate.CDC)
.options(options)
.schema("public")
.table("orders")
.build();
Row row = stream.createRow(Stream.Operator.UPSERT);
row.setValue("txid", "order-001");
row.setValue("amount", 299.99);
row.setValue("status", "paid");
stream.apply(row);
Row del = stream.createRow(Stream.Operator.DELETE_IGNORE);
del.setValue("txid", "order-001");
stream.apply(del);
stream.close();
Selection Guide
| Scenario | Recommended interface |
|---|
| Scheduled batch ETL, hourly or daily | BulkloadStream |
| Kafka real-time consumption | RealtimeStream |
| High-frequency writes under 5 minutes | RealtimeStream |
| Primary-key table writes with UPSERT or DELETE | RealtimeStream CDC mode |
Usage Limits
| Limit | BulkloadStream | RealtimeStream |
|---|
| Primary-key tables | Not supported | Supported in CDC mode |
| High-frequency writes under 5 minutes | Not recommended | Supported |
| Data visibility latency | Visible after close() | Visible after about 1 minute |
| Table Stream/Dynamic Table visibility | After close() | After about 1 minute |
| Schema changes | Recreate the stream | Stop the task and restart about 90 minutes after the schema change |