dataX30安装使用手册.docx
《dataX30安装使用手册.docx》由会员分享,可在线阅读,更多相关《dataX30安装使用手册.docx(13页珍藏版)》请在冰点文库上搜索。
dataX30安装使用手册
DataX3.0使用手册
一、dataX概述
1dataX作用
DataX是一个异构数据源离线同步工具,致力于实现包括关系型数据库(MySQL、Oracle等)、HDFS、Hive、ODPS、HBase、FTP等各种异构数据源之间稳定高效的数据同步功能。
为了解决异构数据源同步问题,DataX将复杂的网状的同步链路变成了星型数据链路,DataX作为中间传输载体负责连接各种数据源。
当需要接入一个新的数据源的时候,只需要将此数据源对接到DataX,便能跟已有的数据源做到无缝数据同步。
2DataX3.0框架设计
DataX本身作为离线数据同步框架,采用Framework+plugin架构构建。
将数据源读取和写入抽象成为Reader/Writer插件,纳入到整个同步框架中。
Reader:
Reader为数据采集模块,负责采集数据源的数据,将数据发送给Framework。
Writer:
Writer为数据写入模块,负责不断向Framework取数据,并将数据写入到目的端。
Framework:
Framework用于连接reader和writer,作为两者的数据传输通道,并处理缓冲,流控,并发,数据转换等核心技术问题。
3DataX3.0插件体系
类型
数据源
Reader(读)
Writer(写)
RDBMS关系型数据库
MySQL
√
√
Oracle
√
√
SQLServer
√
√
PostgreSQL
√
√
达梦
√
√
通用RDBMS(支持所有关系型数据库)
√
√
阿里云数仓数据存储
MaxCompute(原ODPS)
√
√
AnalyticDB(原ADS)
√
OSS
√
√
云数据库Memcache版(原OCS)
√
√
Hive
√
NoSQL数据存储
TableStore(原OTS)
√
√
Hbase0.94
√
√
Hbase1.1
√
√
MongoDB
√
√
无结构化数据存储
TxtFile
√
√
JsonFile
√
FTP
√
√
HDFS
√
√
二、dataX安装
1创建用户组及用户
root@hmaster-hdfs:
/home/ubuntu#groupadddatax
root@hmaster-hdfs:
/home/ubuntu#useradd-gdataxdatax-m-d/home/datax
root@hmaster-hdfs:
/home/ubuntu#passwddatax
EnternewUNIXpassword:
RetypenewUNIXpassword:
passwd:
passwordupdatedsuccessfully
root@hmaster-hdfs:
/home/ubuntu#usermod-Gadm-adatax
root@hmaster-hdfs:
/home/ubuntu#su-datax
datax@hmaster-hdfs:
~$id
uid=1004(datax)gid=1000(ubuntu)groups=1000(ubuntu),4(adm),1002(datax)
2环境变量配置
exportJAVA_HOME=/usr/java/jdk1.7.0_80
exportJRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:
${JAVA_HOME}/lib:
${JRE_HOME}/lib
exportPATH=$JAVA_HOME/bin
exportLANG=zh_CN.UTF-8
3安装python
datax@hmaster-hdfs:
/home/ubuntu$python
Python2.7.6(default,Jun222015,17:
58:
13)
[GCC4.8.2]onlinux2
Type"help","copyright","credits"or"license"formoreinformation.
三、dataX使用
1生成配置文件模板
命令:
bin/pythondatax.py-r{YOUR_READER}-w{YOUR_WRITER}
案例1:
mysqlreader->mysqlwriter
datax@hmaster-hdfs:
~/datax3/datax/bin$pythondatax.py-rmysqlreader-wmysqlwriter
DataX(DATAX-OPENSOURCE-1.0),FromAlibaba!
Copyright(C)2010-2015,AlibabaGroup.AllRightsReserved.
Pleaserefertothemysqlreaderdocument:
Pleaserefertothemysqlwriterdocument:
Pleasesavethefollowingconfigurationasajsonfileanduse
python{DATAX_HOME}/bin/datax.py{JSON_FILE_NAME}.json
torunthejob.
{
"job":
{
"content":
[
{
"reader":
{
"name":
"mysqlreader",
"parameter":
{
"column":
[],
"connection":
[
{
"jdbcUrl":
[],
"table":
[]
}
],
"password":
"",
"username":
"",
"where":
""
}
},
"writer":
{
"name":
"mysqlwriter",
"parameter":
{
"column":
[],
"connection":
[
{
"jdbcUrl":
"",
"table":
[]
}
],
"password":
"",
"preSql":
[],
"session":
[],
"username":
"",
"writeMode":
""
}
}
}
],
"setting":
{
"speed":
{
"channel":
""
}
}
}
}
案例2:
jsonfilereader->hbasewriter
datax@hmaster-hdfs:
~/datax3/datax/bin$pythondatax.py-rjsonfilereader-whbasewriter
DataX(DATAX-OPENSOURCE-1.0),FromAlibaba!
Copyright(C)2010-2015,AlibabaGroup.AllRightsReserved.
Pleaserefertothejsonfilereaderdocument:
Pleaserefertothehbasewriterdocument:
Pleasesavethefollowingconfigurationasajsonfileanduse
python{DATAX_HOME}/bin/datax.py{JSON_FILE_NAME}.json
torunthejob.
{
"job":
{
"content":
[
{
"reader":
{
"name":
"jsonfilereader",
"parameter":
{
"column":
[],
"compress":
"zip",
"encoding":
"",
"path":
[]
}
},
"writer":
{
"name":
"hbase11xwriter",
"parameter":
{
"column":
[],
"encoding":
"",
"hbaseConfig":
{
"hbase.cluster.distributed":
"",
"hbase.rootdir":
"",
"hbase.zookeeper.quorum":
""
},
"mode":
"",
"rowkeyColumn":
[],
"table":
"",
"versionColumn":
{
"index":
"",
"value":
""
}
}
}
}
],
"setting":
{
"speed":
{
"channel":
""
}
}
}
}
2配置定时任务
1.
2.
2.1创建任务列表文件
datax@hmaster-hdfs:
~/datax3/datax/task$vicrontab.tasks.txt
20,30,400-23***/usr/bin/python/home/datax/datax3/datax/bin/datax.py/home/datax/datax3/datax/job/jsonfile2hbase_full[ry_l
og.terminal_log].json>/data/datax/log/datax.crontab.out.log2>&1
2.2crontab加载系统任务列表文件
datax@hmaster-hdfs:
~/datax3/datax/task$crontabcrontab.tasks.txt
datax@hmaster-hdfs:
~/datax3/datax/task$crontab-l
20,30,400-23***/usr/bin/python/home/datax/datax3/datax/bin/datax.py/home/datax/datax3/datax/job/jsonfile2hbase_full[ry_log.terminal_log].json>/data/datax/log/datax.crontab.out.log2>&1
3常见问题及处理
1.
2.
3.
3.1/bin/sh:
1:
java:
notfound
原因:
/home/datax/datax3/datax/bin/datax.py未取到JAVA_HOME环境变量值
四、dataX插件开发及安装
4.
1.Reader插件开发
publicclassJsonFileReaderextendsReader{
publicstaticclassJobextendsReader.Job{
@Override
publicvoidinit(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoiddestroy(){
//TODOAuto-generatedmethodstub
}
@Override
publicListsplit(intadviceNumber){
//TODOAuto-generatedmethodstub
returnnull;
}
}
publicstaticclassTaskextendsReader.Task{
@Override
publicvoidinit(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoiddestroy(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoidstartRead(RecordSenderrecordSender){
//TODOAuto-generatedmethodstub
}
}
}
2.Writter插件开发
publicclassJsonFileWriterextendsWriter{
publicstaticclassJobextendsWriter.Job{
@Override
publicvoidinit(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoiddestroy(){
//TODOAuto-generatedmethodstub
}
@Override
publicListsplit(intmandatoryNumber){
//TODOAuto-generatedmethodstub
returnnull;
}
}
publicstaticclassTaskextendsWriter.Task{
@Override
publicvoidinit(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoiddestroy(){
//TODOAuto-generatedmethodstub
}
@Override
publicvoidstartWrite(RecordReceiverlineReceiver){
//TODOAuto-generatedmethodstub
}
}
}
3.插件部署
目录及文件说明
datax@hmaster-hdfs:
~/datax3/datax/plugin/reader/jsonfilereader$ll
total56
drwxr-xr-x3dataxubuntu4096Jul616:
10./
drwxr-xr-x21dataxubuntu4096Jun2111:
05../
-rw-r--r--1dataxubuntu36348Jun2217:
52jsonfilereader.jar
drwxr-xr-x2dataxubuntu4096May1612:
46libs/
-rw-------1dataxubuntu315May1611:
24plugin.json
-rw-------1dataxubuntu149May1518:
24plugin_job_template.json
●jsonfilereader.jar插件JAR包文件
●libs插件依赖第三方JAR包文件
●plugin.json插件说明文件
●plugin_job_template.json插件配置模板文件
plugin.json文件内容说明
{
"name":
"jsonfilereader",
"class":
"com.alibaba.datax.plugin.reader.jsonfilereader.JsonFileReader",
"description":
"useScene:
test.mechanism:
usedataxframeworktotransportdatafromjsonfile.warn:
Themoreyouknowabout
thedata,thelessproblemsyouencounter.",
"developer":
"alibaba"
}
Name:
插件名称
Class:
jsonfilereader.jar文件中全路径
Description:
功能描述
plugin_job_template.json文件内容说明
{
"name":
"jsonfilereader",
"parameter":
{
"path":
[],
"encoding":
"",
"column":
[],
"compress":
"zip"
}
}
五、使用技巧