【hudi】数据湖客户端运维工具Hudi-Cli实战
数据湖客户端运维工具Hudi-Cli实战
help
hudi:student_mysql_cdc_hudi_fl->help
AVAILABLE COMMANDSArchived Commits Commandtrigger archival: trigger archivalshow archived commits: Read commits from archived files and show detailsshow archived commit stats: Read commits from archived files and show detailsBootstrap Commandbootstrap run: Run a bootstrap action for current Hudi tablebootstrap index showmapping: Show bootstrap index mappingbootstrap index showpartitions: Show bootstrap indexed partitionsBuilt-In Commandshelp: Display help about available commandsstacktrace: Display the full stacktrace of the last error.clear: Clear the shell screen.quit, exit: Exit the shell.history: Display or save the history of previously run commandsversion: Show version infoscript: Read and execute commands from a file.Cleans Commandcleans show: Show the cleansclean showpartitions: Show partition level details of a cleancleans run: run cleanClustering Commandclustering run: Run Clusteringclustering scheduleAndExecute: Run Clustering. Make a cluster plan first and execute that plan immediatelyclustering schedule: Schedule ClusteringCommits Commandcommits compare: Compare commits with another Hoodie tablecommits sync: Sync commits with another Hoodie tablecommit showpartitions: Show partition level details of a commitcommits show: Show the commitscommits showarchived: Show the archived commitscommit showfiles: Show file level details of a commitcommit show_write_stats: Show write stats of a commitCompaction Commandcompaction run: Run Compaction for given instant timecompaction scheduleAndExecute: Schedule compaction plan and execute this plancompaction showarchived: Shows compaction details for a specific compaction instantcompaction repair: Renames the files to make them consistent with the timeline as dictated by Hoodie metadata. Use when compaction unschedule fails partially.compaction schedule: Schedule Compactioncompaction show: Shows compaction details for a specific compaction instantcompaction unscheduleFileId: UnSchedule Compaction for a fileIdcompaction validate: Validate Compactioncompaction unschedule: Unschedule Compactioncompactions show all: Shows all compactions that are in active timelinecompactions showarchived: Shows compaction details for specified time windowDiff Commanddiff partition: Check how file differs across range of commits. It is meant to be used only for partitioned tables.diff file: Check how file differs across range of commitsExport Commandexport instants: Export Instants and their metadata from the TimelineFile System View Commandshow fsview all: Show entire file-system viewshow fsview latest: Show latest file-system viewHDFS Parquet Import Commandhdfsparquetimport: Imports Parquet table to a hoodie tableHoodie Log File Commandshow logfile records: Read records from log filesshow logfile metadata: Read commit metadata from log filesHoodie Sync Validate Commandsync validate: Validate the sync by counting the number of recordsKerberos Authentication Commandkerberos kdestroy: Destroy Kerberos authenticationkerberos kinit: Perform Kerberos authenticationMarkers Commandmarker delete: Delete the markerMetadata Commandmetadata stats: Print stats about the metadatametadata list-files: Print a list of all files in a partition from the metadatametadata list-partitions: List all partitions from metadatametadata validate-files: Validate all files in all partitions from the metadatametadata delete: Remove the Metadata Tablemetadata create: Create the Metadata Table if it does not existmetadata init: Update the metadata table from commits since the creationmetadata set: Set options for Metadata TableRepairs Commandrepair deduplicate: De-duplicate a partition path contains duplicates & produce repaired files to replace withrename partition: Rename partition. Usage: rename partition --oldPartition <oldPartition> --newPartition <newPartition>repair overwrite-hoodie-props: Overwrite hoodie.properties with provided file. Risky operation. Proceed with caution!repair migrate-partition-meta: Migrate all partition meta file currently stored in text format to be stored in base file format. See HoodieTableConfig#PARTITION_METAFILE_USE_DATA_FORMAT.repair addpartitionmeta: Add partition metadata to a table, if not presentrepair deprecated partition: Repair deprecated partition ("default"). Re-writes data from the deprecated partition into __HIVE_DEFAULT_PARTITION__repair show empty commit metadata: show failed commitsrepair corrupted clean files: repair corrupted clean filesRollbacks Commandshow rollback: Show details of a rollback instantcommit rollback: Rollback a commitshow rollbacks: List all rollback instantsSavepoints Commandsavepoint rollback: Savepoint a commitsavepoints show: Show the savepointssavepoint create: Savepoint a commitsavepoint delete: Delete the savepointSpark Env Commandset: Set spark launcher env to clishow env: Show spark launcher env by keyshow envs all: Show spark launcher envsStats Commandstats filesizes: File Sizes. Display summary stats on sizes of filesstats wa: Write Amplification. Ratio of how many records were upserted to how many records were actually writtenTable Commandtable update-configs: Update the table configs with configs with provided file.table recover-configs: Recover table configs, from update/delete that failed midway.refresh, metadata refresh, commits refresh, cleans refresh, savepoints refresh: Refresh table metadatacreate: Create a hoodie table if not presenttable delete-configs: Delete the supplied table configs from the table.fetch table schema: Fetches latest table schemaconnect: Connect to a hoodie tabledesc: Describe Hoodie Table propertiesTemp View Commandtemp_query, temp query: query against created temp viewtemps_show, temps show: Show all views nametemp_delete, temp delete: Delete view nameTimeline Commandmetadata timeline show incomplete: List all incomplete instants in active timeline of metadata tablemetadata timeline show active: List all instants in active timeline of metadata tabletimeline show incomplete: List all incomplete instants in active timelinetimeline show active: List all instants in active timelineUpgrade Or Downgrade Commanddowngrade table: Downgrades a tableupgrade table: Upgrades a tableUtils Commandutils loadClass: Load a class
kerberos
kerberos kinit --principal xxx@XXXXX.COM --keytab /xxx/kerberos/xxx.keytab
先看下样例表的表结构:
分区表哦!
-- FLink SQL建表语句
create table student_mysql_cdc_hudi_fl(`_hoodie_commit_time` string comment 'hoodie commit time',`_hoodie_commit_seqno` string comment 'hoodie commit seqno',`_hoodie_record_key` string comment 'hoodie record key',`_hoodie_partition_path` string comment 'hoodie partition path',`_hoodie_file_name` string comment 'hoodie file name',`s_id` bigint not null comment '主键',`s_name` string not null comment '姓名',`s_age` int comment '年龄',`s_sex` string comment '性别',`s_part` string not null comment '分区字段',`create_time` timestamp(6) not null comment '创建时间',`dl_ts` timestamp(6) not null,`dl_s_sex` string not null,PRIMARY KEY(s_id) NOT ENFORCED
)PARTITIONED BY (`dl_s_sex`) with (
,'connector' = 'hudi'
,'hive_sync.table' = 'student_mysql_cdc_hudi'
,'hoodie.datasource.write.drop.partition.columns' = 'true'
,'hoodie.datasource.write.hive_style_partitioning' = 'true'
,'hoodie.datasource.write.partitionpath.field' = 'dl_s_sex'
,'hoodie.datasource.write.precombine.field' = 'dl_ts'
,'path' = 'hdfs://xxx/hudi_db.db/student_mysql_cdc_hudi'
,'precombine.field' = 'dl_ts'
,'primaryKey' = 's_id'
)
table
connect
connect --path /xxx/hudi_db.db/student_mysql_cdc_hudi
desc
desc
refresh
refresh
fetch table schema
fetch table schema
"type" : "record","name" : "student_mysql_cdc_hudi_fl_record","namespace" : "hoodie.student_mysql_cdc_hudi_fl","fields" : [ {"name" : "_hoodie_commit_time","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_commit_seqno","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_record_key","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_partition_path","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_file_name","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "_hoodie_operation","type" : [ "null", "string" ],"doc" : "","default" : null}, {"name" : "s_id","type" : "long"}, {"name" : "s_name","type" : "string"}, {"name" : "s_age","type" : [ "null", "int" ],"default" : null}, {"name" : "s_sex","type" : [ "null", "string" ],"default" : null}, {"name" : "s_part","type" : "string"}, {"name" : "create_time","type" : {"type" : "long","logicalType" : "timestamp-micros"}}, {"name" : "dl_ts","type" : {"type" : "long","logicalType" : "timestamp-micros"}}, {"name" : "dl_s_sex","type" : "string"} ]
}
commit
commits show
commits show --sortBy "Total Bytes Written" --desc true --limit 10
commits showarchived
commits showarchived
commit showfiles
commit showfiles --commit 20230915164442583
commit showfiles --commit 20230915164442583 --sortBy "Partition Path"
commit showpartitions
commit showpartitions --commit 20230915164442583
commit showpartitions --commit 20230915164442583 --sortBy "Total Bytes Written" --desc true --limit 10
commit show_write_stats
commit show_write_stats --commit 20230915164442583
File System View
show fsview all
show fsview all
show fsview latest
show fsview latest --partitionPath dl_s_sex=female
Log File
show logfile records
# 注意10 是需要取数据记录条数
show logfile records 10 /xxx/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0
数据是json格式的:
{"_hoodie_commit_time": "20230915163856302","_hoodie_commit_seqno": "20230915163856302_0_83","_hoodie_record_key": "88","_hoodie_partition_path": "dl_s_sex=female","_hoodie_file_name": "bf4b06b4-e897-42df-8a3c-a3a2f737d367","_hoodie_operation": "I","s_id": 88,"s_name": "傅亮","s_age": 4,"s_sex": "female","s_part": "2017/11/20","create_time": 790128367000000,"dl_ts": -28800000000,"dl_s_sex": "female"
}
show logfile metadata
show logfile metadata /xxx/xxx/hive/hudi_db.db/student_mysql_cdc_hudi/dl_s_sex=female/dl_create_time_yyyy=1971/dl_create_time_mm=03/.dadac2dd-7e5e-46c3-9b27-f1f03e04a90c_20230915151426134.log.1_0
图片中还有FooterMetadata
列没显示全
{"SCHEMA": "{\"type\":\"record\",\"name\":\"student_mysql_cdc_hudi_fl_record\",\"namespace\":\"hoodie.student_mysql_cdc_hudi_fl\",\"fields\":[{\"name\":\"_hoodie_commit_time\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_commit_seqno\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_record_key\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_partition_path\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_file_name\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"_hoodie_operation\",\"type\":[\"null\",\"string\"],\"doc\":\"\",\"default\":null},{\"name\":\"s_id\",\"type\":\"long\"},{\"name\":\"s_name\",\"type\":\"string\"},{\"name\":\"s_age\",\"type\":[\"null\",\"int\"],\"default\":null},{\"name\":\"s_sex\",\"type\":[\"null\",\"string\"],\"default\":null},{\"name\":\"s_part\",\"type\":\"string\"},{\"name\":\"create_time\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_ts\",\"type\":{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"}},{\"name\":\"dl_s_sex\",\"type\":\"string\"}]}","INSTANT_TIME": "20230915164442583"
}
differ
diff partition
diff partition dl_s_sex=female
differ file
# 需要提供FileID。就是log文件的部分
# 如log文件:.bf4b06b4-e897-42df-8a3c-a3a2f737d367_20230915163856302.log.1_0-1-0
diff file bf4b06b4-e897-42df-8a3c-a3a2f737d367
rollbacks
show rollbacks
show rollbacks
stats
stats filesizes
stats filesizes --partitionPath dl_s_sex=female --sortBy "95th" --desc true --limit 3
stats wa
stats wa
compaction
compactions show all
compactions show all
compactions showarchived
compactions showarchived
compaction showarchived
compaction showarchived 20230915200042501
compaction show
compaction show 20230915174042680
参考文章:
Apache Hudi数据湖hudi-cli客户端使用
相关文章:

【hudi】数据湖客户端运维工具Hudi-Cli实战
数据湖客户端运维工具Hudi-Cli实战 help hudi:student_mysql_cdc_hudi_fl->help AVAILABLE COMMANDSArchived Commits Commandtrigger archival: trigger archivalshow archived commits: Read commits from archived files and show detailsshow archived commit stats: …...
RK3588 添加ROOT权限
一.ROOT简介 ROOT权限是Linux和Unix系统中的超级管理员用户帐户,该帐户拥有整个系统的最高权利,可以执行几乎所有操作。ROOT就是获取安卓系统中的最高用户权限,以便执行一些需要高权限才能执行的操作(包括卸载系统自带程序、刷机、备份、还原…...

【云原生】k8s-----集群调度
目录 1.k8s的list-watch机制 1.1 list-watc机制简介 1.2 根据list-watch机制,pod的创建流程 2.scheduler的调度策略 2.1 scheduler的调度策略简介 2.2 Scheduler预选策略的算法 2.3 Scheduler优选策略的算法 3. k8s中的标签管理及nodeSelector和nodeName的 调…...

一键集成prometheus监控微服务接口平均响应时长
一、效果展示 二、环境准备 prometheus + grafana环境 参考博文:https://blog.csdn.net/luckywuxn/article/details/129475991 三、导入依赖 <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter...

2023/9/13 -- C++/QT
作业: 1> 将之前定义的栈类和队列类都实现成模板类 栈: #include <iostream> #define MAX 40 using namespace std;template <typename T> class Stack{ private:T *data;int top; public:Stack();~Stack();Stack(const Stack &ot…...

mybatis mapper.xml转建表语句
从网上下载了代码,但是发现没有DDL建表语句,只能自己手动创建了,感觉太麻烦,就写了一个工具类 将所有的mapper.xml放入到一个文件夹中,程序会自动读取生成建表语句 依赖的jar <dependency><groupId>org.d…...
封装使用Axios进行前后端交互
Axios是一个强大的HTTP客户端,用于在Vue.js应用中进行前后端数据交互。本文将介绍如何在Vue中使用Axios,并通过一个企业应用场景来演示其实际应用。 Axios简介 公众号:Code程序人生,个人网站:https://creatorblog.cn A…...
SOA、分布式、微服务
SOA: SOA是一种软件设计架构,用于构建分布式系统和应用程序。它将应用程序拆分为一系列松耦合的服务,这些服务通过标准化的接口进行通信,并能够以可编程方式组合和重用。SOA的目标是提高系统的灵活性、可扩展性和可维护性。 特点&…...
json数据传输压缩以及数据切片分割分块传输多种实现方法,大数据量情况下zlib压缩以及bytes指定长度分割
json数据传输压缩以及数据切片分割分块传输多种实现方法,大数据量情况下zlib压缩以及bytes指定长度分割。 import sys import zlib import json import mathKAFKA_MAX_SIZE 1024 * 1024 CONTENT_MIN_MAX_SIZE KAFKA_MAX_SIZE * 0.9def split_data(data):"&q…...

移动端APP测试-如何指定测试策略、测试标准?
制定项目的测试策略是一个重要的步骤,可以帮助测试团队明确测试目标、测试范围、测试方法、测试资源、测试风险等,从而提高测试效率和质量。本篇是一些经验总结,理论分享。并不是绝对正确的,也欢迎大家一起讨论。 文章目录 一、测…...

【Redis】深入探索 Redis 主从结构的创建、配置及其底层原理
文章目录 前言一、对 Redis 主从结构的认识1.1 什么是主从结构1.2 主从结构解决的问题 二、主从结构创建2.1 配置并建立从节点2.2.1 从节点配置文件2.2.2 启动并连接 Redis 主从节点2.2.3 SLAVEOF 命令2.2.4 断开主从关系 2.2 查看主从节点的信息2.2.1 INFO REPLICATION 命令2.…...

CSS 滚动驱动动画 scroll-timeline ( scroll-timeline-name ❤️ scroll-timeline-axis )
scroll-timelinescroll-timeline-name❤️scroll-timeline-axis 解决问题语法 animation-timeline-nameanimation-timeline-axis scroll-timeline ( scroll-timeline-name ❤️ scroll-timeline-axis ) 在 scroll() 的最后我们遇到了因为定位问题导致滚动效果失效的情况, 当…...

9.19号作业
2> 完成文本编辑器的保存工作 widget.h #ifndef WIDGET_H #define WIDGET_H#include <QWidget> #include <QFontDialog> #include <QFont> #include <QMessageBox> #include <QDebug> #include <QColorDialog> #include <QColor&g…...

Mybatis学习笔记9 动态SQL
Mybatis学习笔记8 查询返回专题_biubiubiu0706的博客-CSDN博客 动态SQL的业务场景: 例如 批量删除 get请求 uri?id18&id19&id20 或者post id18&id19&id20 String[] idsrequest.getParameterValues("id") 那么这句SQL是需要动态的 还…...

element表格 和后台联调
1.配置接口 projectList:/api/goods/xxx,//产品列表2.请求接口(get请求默认参数page) // 产品列表 pageprojectList(params){return axios.get(base.projectList,{params})}3.获取数据 直接放到created里边去了 刷新页面就可以看到 async projectList(page){let res await t…...

基于SSM的智慧城市实验室主页系统的设计与实现
末尾获取源码 开发语言:Java Java开发工具:JDK1.8 后端框架:SSM 前端:采用Vue技术开发 数据库:MySQL5.7和Navicat管理工具结合 服务器:Tomcat8.5 开发软件:IDEA / Eclipse 是否Maven项目&#x…...

怒赞,阿里P8推荐的Java面试宝典:41个专题PDF(史上最全+面试必备)
《尼恩Java面试宝典》 40岁老架构师 尼恩 经过对大量 Java面试题 的不断梳理、迭代, 编著成5000页的《尼恩Java面试宝典》,致力于体系化, 系统化,形象化 梳理,形成一个大的知识体系,从而帮助大家 进大厂&a…...
线程池各个参数设置说明
1. corePoolSize 核心线程数 看处理业务属于IO密集型还是属于cpu密集型IO密集型: 通常设置为N1,还有一个计算公式:线程数 cpu数*(线程等待时间/线程总的处理时间) 但是由于服务器除了这个服务可能还部署有其他服务,…...

springBoot对接多个mq并且实现延迟队列---未完待续
mq调用流程 创建消息转换器 package com.wd.config;import org.springframework.amqp.support.converter.Jackson2JsonMessageConverter; import org.springframework.amqp.support.converter.MessageConverter; import org.springframework.context.annotation.Bean; import o…...

Pytorch从零开始实战04
Pytorch从零开始实战——猴痘病识别 本系列来源于365天深度学习训练营 原作者K同学 文章目录 Pytorch从零开始实战——猴痘病识别环境准备数据集模型选择模型训练数据可视化其他模型图片预测 环境准备 本文基于Jupyter notebook,使用Python3.8,Pytor…...

LBE-LEX系列工业语音播放器|预警播报器|喇叭蜂鸣器的上位机配置操作说明
LBE-LEX系列工业语音播放器|预警播报器|喇叭蜂鸣器专为工业环境精心打造,完美适配AGV和无人叉车。同时,集成以太网与语音合成技术,为各类高级系统(如MES、调度系统、库位管理、立库等)提供高效便捷的语音交互体验。 L…...

Opencv中的addweighted函数
一.addweighted函数作用 addweighted()是OpenCV库中用于图像处理的函数,主要功能是将两个输入图像(尺寸和类型相同)按照指定的权重进行加权叠加(图像融合),并添加一个标量值&#x…...
Golang dig框架与GraphQL的完美结合
将 Go 的 Dig 依赖注入框架与 GraphQL 结合使用,可以显著提升应用程序的可维护性、可测试性以及灵活性。 Dig 是一个强大的依赖注入容器,能够帮助开发者更好地管理复杂的依赖关系,而 GraphQL 则是一种用于 API 的查询语言,能够提…...
渲染学进阶内容——模型
最近在写模组的时候发现渲染器里面离不开模型的定义,在渲染的第二篇文章中简单的讲解了一下关于模型部分的内容,其实不管是方块还是方块实体,都离不开模型的内容 🧱 一、CubeListBuilder 功能解析 CubeListBuilder 是 Minecraft Java 版模型系统的核心构建器,用于动态创…...
Qwen3-Embedding-0.6B深度解析:多语言语义检索的轻量级利器
第一章 引言:语义表示的新时代挑战与Qwen3的破局之路 1.1 文本嵌入的核心价值与技术演进 在人工智能领域,文本嵌入技术如同连接自然语言与机器理解的“神经突触”——它将人类语言转化为计算机可计算的语义向量,支撑着搜索引擎、推荐系统、…...
Unit 1 深度强化学习简介
Deep RL Course ——Unit 1 Introduction 从理论和实践层面深入学习深度强化学习。学会使用知名的深度强化学习库,例如 Stable Baselines3、RL Baselines3 Zoo、Sample Factory 和 CleanRL。在独特的环境中训练智能体,比如 SnowballFight、Huggy the Do…...

Spring数据访问模块设计
前面我们已经完成了IoC和web模块的设计,聪明的码友立马就知道了,该到数据访问模块了,要不就这俩玩个6啊,查库势在必行,至此,它来了。 一、核心设计理念 1、痛点在哪 应用离不开数据(数据库、No…...

C++ Visual Studio 2017厂商给的源码没有.sln文件 易兆微芯片下载工具加开机动画下载。
1.先用Visual Studio 2017打开Yichip YC31xx loader.vcxproj,再用Visual Studio 2022打开。再保侟就有.sln文件了。 易兆微芯片下载工具加开机动画下载 ExtraDownloadFile1Info.\logo.bin|0|0|10D2000|0 MFC应用兼容CMD 在BOOL CYichipYC31xxloaderDlg::OnIni…...

python执行测试用例,allure报乱码且未成功生成报告
allure执行测试用例时显示乱码:‘allure’ �����ڲ����ⲿ���Ҳ���ǿ�&am…...

零知开源——STM32F103RBT6驱动 ICM20948 九轴传感器及 vofa + 上位机可视化教程
STM32F1 本教程使用零知标准板(STM32F103RBT6)通过I2C驱动ICM20948九轴传感器,实现姿态解算,并通过串口将数据实时发送至VOFA上位机进行3D可视化。代码基于开源库修改优化,适合嵌入式及物联网开发者。在基础驱动上新增…...