需求:在mysql数据库中有一个人员信息表(userinfo) – 创建数据库 2020年04月30日的数据如下: insert into 结果: UPDATE 第一次最后结果视图: UPDATE 从以上数据可以看出,随着时间推移而变化,我们需要将人员的所有变化的历史信息都保存下来,请在Hive中使用拉链表进行实现。 2、使用kettle把mysql表数据输出到hive表
数据字段如下:
在Mysql中的建表语句为:
create database if not exists exam;
– 创建人员表
create table if not exists exam
.userinfo
(
user_id varchar(50), – 人员编号
phone_number varchar(50), – 人员手机号
health_code varchar(50), – 健康码的颜色
reg_time varchar(50), – 用户注册日期
modify_time varchar(50) – 健康码颜色修改日期
);
插入语句:exam
.userinfo
(user_id, phone_number, health_code, reg_time,modify_time) values
(‘001’, ‘15953100001’, ‘绿色’, ‘2020-03-14’, ‘2020-04-30’),
(‘002’, ‘15953100002’, ‘绿色’, ‘2020-03-14’, ‘2020-04-30’),
(‘003’, ‘15953100003’, ‘黄色’, ‘2020-03-14’, ‘2020-04-30’),
(‘004’, ‘15953100004’, ‘红色’, ‘2020-03-14’, ‘2020-04-30’);
2020年05月01日的数据如下:
修改和添加的语句:exam
.userinfo
SET health_code = ‘绿色’, modify_time = ‘2020-05-01’ WHERE user_id = ‘003’;
UPDATE exam
.userinfo
SET health_code = ‘黄色’, modify_time = ‘2020-05-01’ WHERE user_id = ‘004’;
INSERT INTO exam
.userinfo
(user_id, phone_number, health_code, reg_time,modify_time) VALUES
(‘005’, ‘15953100005’, ‘绿色’, ‘2020-05-01’, ‘2020-05-01’);
2020年05月02日的数据如下:
修改和添加语句:exam
.userinfo
SET health_code = ‘绿色’, modify_time = ‘2020-05-02’ WHERE user_id = ‘004’;
UPDATE exam
.userinfo
SET health_code = ‘黄色’, modify_time = ‘2020-05-02’ WHERE user_id = ‘005’;
INSERT INTO exam
.userinfo
(user_id, phone_number, health_code, reg_time,modify_time) VALUES
(‘006’, ‘15953100006’, ‘绿色’, ‘2020-05-02’, ‘2020-05-02’);
说明:分区字段为dt,判断字段为modify_time,在拉链表中添加的字段为dw_start_date(生效时间),dw_end_date(结束时间)。
第二次最后结果视图:
解题思路:
1、创建mysql库和表-- 创建数据库 CREATE DATABASE IF NOT EXISTS exam; -- 创建人员表 CREATE TABLE IF NOT EXISTS `exam`.`userinfo`( user_id VARCHAR(50), -- 人员编号 phone_number VARCHAR(50), -- 人员手机号 health_code VARCHAR(50), -- 健康码的颜色 reg_time VARCHAR(50), -- 用户注册日期 modify_time VARCHAR(50) -- 健康码颜色修改日期 ); INSERT INTO `exam`.`userinfo`(user_id, phone_number, health_code, reg_time,modify_time) VALUES ('001', '15953100001', '绿色', '2020-03-14', '2020-04-30'), ('002', '15953100002', '绿色', '2020-03-14', '2020-04-30'), ('003', '15953100003', '黄色', '2020-03-14', '2020-04-30'), ('004', '15953100004', '红色', '2020-03-14', '2020-04-30'); UPDATE `exam`.`userinfo` SET health_code = '绿色', modify_time = '2020-05-01' WHERE user_id = '003'; UPDATE `exam`.`userinfo` SET health_code = '黄色', modify_time = '2020-05-01' WHERE user_id = '004'; INSERT INTO `exam`.`userinfo`(user_id, phone_number, health_code, reg_time,modify_time) VALUES ('005', '15953100005', '绿色', '2020-05-01', '2020-05-01'); UPDATE `exam`.`userinfo` SET health_code = '绿色', modify_time = '2020-05-02' WHERE user_id = '004'; UPDATE `exam`.`userinfo` SET health_code = '黄色', modify_time = '2020-05-02' WHERE user_id = '005'; INSERT INTO `exam`.`userinfo`(user_id, phone_number, health_code, reg_time,modify_time) VALUES ('006', '15953100006', '绿色', '2020-05-02', '2020-05-02');
3、在hive里面创建数据库和表CREATE TABLE IF NOT EXISTS `itcast_ods`.`userinfo`( user_id string,-- 人员编号 phone_number string,-- 人员手机号 health_code string,-- 健康码的颜色 reg_time string,-- 用户注册日期 modify_time string-- 健康码颜色修改日期 ) partitioned by (dt string) STORED AS PARQUET; CREATE TABLE IF NOT EXISTS `itcast_dw`.`userinfo`( user_id string,-- 人员编号 phone_number string,-- 人员手机号 health_code string,-- 健康码的颜色 reg_time string,-- 用户注册日期 modify_time string,-- 健康码颜色修改日期 dw_start_date string,-- 生效日期 dw_end_date string-- 失效日期 ) STORED AS PARQUET; -- 将全量数据导入到dw层维度表 insert overwrite table `itcast_dw`.`userinfo` select user_id string, phone_number string, health_code string, reg_time string, modify_time string, modify_time as dw_start_date, '9999-12-31' as dw_end_date from `itcast_ods`.`userinfo` where dt = '20200430'; -- 将增量数据导入到dw层维度表(第一题20200501) insert overwrite table `itcast_dw`.`userinfo` select t1.user_id, t1.phone_number, t1.health_code, t1.reg_time, t1.modify_time, t1.dw_start_date, case when (t2.user_id is not null and t1.dw_end_date > '20200501') then '20200501' else t1.dw_end_date end as dw_end_date from `itcast_dw`.`userinfo` t1 left join (select * from `itcast_ods`.`userinfo` where dt='20200501') t2 on t1.user_id = t2.user_id union all select user_id, phone_number, health_code, reg_time, modify_time, modify_time as dw_start_date, '9999-12-31' as dw_end_date from `itcast_ods`.`userinfo` where dt='20200501' order by dw_start_date, user_id; -- 将增量数据导入到dw层维度表(第一题20200502) insert overwrite table `itcast_dw`.`userinfo` select t1.user_id, t1.phone_number, t1.health_code, t1.reg_time, t1.modify_time, t1.dw_start_date, case when (t2.user_id is not null and t1.dw_end_date > '20200502') then '20200502' else t1.dw_end_date end as dw_end_date from `itcast_dw`.`userinfo` t1 left join (select * from `itcast_ods`.`userinfo` where dt='20200502') t2 on t1.user_id = t2.user_id union all select user_id, phone_number, health_code, reg_time, modify_time, modify_time as dw_start_date, '9999-12-31' as dw_end_date from `itcast_ods`.`userinfo` where dt='20200502' order by dw_start_date, user_id;
本网页所有视频内容由 imoviebox边看边下-网页视频下载, iurlBox网页地址收藏管理器 下载并得到。
ImovieBox网页视频下载器 下载地址: ImovieBox网页视频下载器-最新版本下载
本文章由: imapbox邮箱云存储,邮箱网盘,ImageBox 图片批量下载器,网页图片批量下载专家,网页图片批量下载器,获取到文章图片,imoviebox网页视频批量下载器,下载视频内容,为您提供.
阅读和此文章类似的: 全球云计算