dataX30安装使用手册Word文档格式.docx
《dataX30安装使用手册Word文档格式.docx》由会员分享,可在线阅读,更多相关《dataX30安装使用手册Word文档格式.docx(13页珍藏版)》请在冰点文库上搜索。
√
Oracle
SQLServer
PostgreSQL
达梦
通用RDBMS(支持所有关系型数据库)
阿里云数仓数据存储
MaxCompute(原ODPS)
AnalyticDB(原ADS)
OSS
云数据库Memcache版(原OCS)
Hive
NoSQL数据存储
TableStore(原OTS)
Hbase0.94
Hbase1.1
MongoDB
无结构化数据存储
TxtFile
JsonFile
FTP
HDFS
二、dataX安装
1创建用户组及用户
root@hmaster-hdfs:
/home/ubuntu#groupadddatax
/home/ubuntu#useradd-gdataxdatax-m-d/home/datax
/home/ubuntu#passwddatax
EnternewUNIXpassword:
RetypenewUNIXpassword:
passwd:
passwordupdatedsuccessfully
/home/ubuntu#usermod-Gadm-adatax
/home/ubuntu#su-datax
datax@hmaster-hdfs:
~$id
uid=1004(datax)gid=1000(ubuntu)groups=1000(ubuntu),4(adm),1002(datax)
2环境变量配置
exportJAVA_HOME=/usr/java/jdk1.7.0_80
exportJRE_HOME=${JAVA_HOME}/jre
exportCLASSPATH=.:
${JAVA_HOME}/lib:
${JRE_HOME}/lib
exportPATH=$JAVA_HOME/bin
exportLANG=zh_CN.UTF-8
3安装python
/home/ubuntu$python
Python2.7.6(default,Jun222015,17:
58:
13)
[GCC4.8.2]onlinux2
Type"
help"
"
copyright"
credits"
or"
license"
formoreinformation.
三、dataX使用
1生成配置文件模板
命令:
bin/pythondatax.py-r{YOUR_READER}-w{YOUR_WRITER}
案例1:
mysqlreader->
mysqlwriter
~/datax3/datax/bin$pythondatax.py-rmysqlreader-wmysqlwriter
DataX(DATAX-OPENSOURCE-1.0),FromAlibaba!
Copyright(C)2010-2015,AlibabaGroup.AllRightsReserved.
Pleaserefertothemysqlreaderdocument:
Pleaserefertothemysqlwriterdocument:
Pleasesavethefollowingconfigurationasajsonfileanduse
python{DATAX_HOME}/bin/datax.py{JSON_FILE_NAME}.json
torunthejob.
{
"
job"
:
{
content"
[
reader"
name"
mysqlreader"
parameter"
column"
[],
connection"
jdbcUrl"
table"
[]
}
],
password"
"
username"
where"
},
writer"
mysqlwriter"
preSql"
session"
writeMode"
setting"
speed"
channel"
}
案例2:
jsonfilereader->
hbasewriter
~/datax3/datax/bin$pythondatax.py-rjsonfilereader-whbasewriter
Pleaserefertothejsonfilereaderdocument:
Pleaserefertothehbasewriterdocument:
jsonfilereader"
compress"
zip"
encoding"
path"
hbase11xwriter"
hbaseConfig"
hbase.cluster.distributed"
hbase.rootdir"
hbase.zookeeper.quorum"
mode"
rowkeyColumn"
versionColumn"
index"
value"
2配置定时任务
1.
2.
2.1创建任务列表文件
~/datax3/datax/task$vicrontab.tasks.txt
20,30,400-23***/usr/bin/python/home/datax/datax3/datax/bin/datax.py/home/datax/datax3/datax/job/jsonfile2hbase_full[ry_l
og.terminal_log].json>
/data/datax/log/datax.crontab.out.log2>
&
1
2.2crontab加载系统任务列表文件
~/datax3/datax/task$crontabcrontab.tasks.txt
~/datax3/datax/task$crontab-l
20,30,400-23***/usr/bin/python/home/datax/datax3/datax/bin/datax.py/home/datax/datax3/datax/job/jsonfile2hbase_full[ry_log.terminal_log].json>
3常见问题及处理
3.
3.1/bin/sh:
1:
java:
notfound
原因:
/home/datax/datax3/datax/bin/datax.py未取到JAVA_HOME环境变量值
四、dataX插件开发及安装
4.
1.Reader插件开发
publicclassJsonFileReaderextendsReader{
publicstaticclassJobextendsReader.Job{
@Override
publicvoidinit(){
//TODOAuto-generatedmethodstub
}
publicvoiddestroy(){
publicList<
Configuration>
split(intadviceNumber){
returnnull;
publicstaticclassTaskextendsReader.Task{
publicvoidstartRead(RecordSenderrecordSender){
2.Writter插件开发
publicclassJsonFileWriterextendsWriter{
publicstaticclassJobextendsWriter.Job{
split(intmandatoryNumber){
publicstaticclassTaskextendsWriter.Task{
publicvoidstartWrite(RecordReceiverlineReceiver){
3.插件部署
目录及文件说明
~/datax3/datax/plugin/reader/jsonfilereader$ll
total56
drwxr-xr-x3dataxubuntu4096Jul616:
10./
drwxr-xr-x21dataxubuntu4096Jun2111:
05../
-rw-r--r--1dataxubuntu36348Jun2217:
52jsonfilereader.jar
drwxr-xr-x2dataxubuntu4096May1612:
46libs/
-rw-------1dataxubuntu315May1611:
24plugin.json
-rw-------1dataxubuntu149May1518:
24plugin_job_template.json
●jsonfilereader.jar插件JAR包文件
●libs插件依赖第三方JAR包文件
●plugin.json插件说明文件
●plugin_job_template.json插件配置模板文件
plugin.json文件内容说明
class"
com.alibaba.datax.plugin.reader.jsonfilereader.JsonFileReader"
description"
useScene:
test.mechanism:
usedataxframeworktotransportdatafromjsonfile.warn:
Themoreyouknowabout
thedata,thelessproblemsyouencounter."
developer"
alibaba"
Name:
插件名称
Class:
jsonfilereader.jar文件中全路径
Description:
功能描述
plugin_job_template.json文件内容说明
[],
五、使用技巧