开源软件名称:HAWQ
开源软件地址:https://gitee.com/apache/hawq
开源软件介绍:
CI Process | Status |
---|
Travis CI Build | | Apache Release Audit Tool (RAT) | | Coverity Static Analysis | |
Website |Wiki |Documentation |Developer Mailing List |User Mailing List |Q&A Collections |Open Defect Apache HAWQ
Apache HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop. HAWQ reads data from and writes data to HDFS natively. HAWQ delivers industry-leading performance and linear scalability. It provides users the tools to confidently and successfully interact with petabyte range data sets. HAWQ provides users with a complete, standards compliant SQL interface. More specifically, HAWQ has the following features: - On-premise or cloud deployment
- Robust ANSI SQL compliance: SQL-92, SQL-99, SQL-2003, OLAP extension
- Extremely high performance. many times faster than other Hadoop SQL engine
- World-class parallel optimizer
- Full transaction capability and consistency guarantee: ACID
- Dynamic data flow engine through high speed UDP based interconnect
- Elastic execution engine based on virtual segment & data locality
- Support multiple level partitioning and List/Range based partitioned tables
- Multiple compression method support: snappy, gzip, zlib
- Multi-language user defined function support: Python, Perl, Java, C/C++, R
- Advanced machine learning and data mining functionalities through MADLib
- Dynamic node expansion: in seconds
- Most advanced three level resource management: Integrate with YARN and hierarchical resource queues.
- Easy access of all HDFS data and external system data (for example, HBase)
- Hadoop Native: from storage (HDFS), resource management (YARN) to deployment (Ambari).
- Authentication & Granular authorization: Kerberos, SSL and role based access
- Advanced C/C++ access library to HDFS and YARN: libhdfs3 & libYARN
- Support most third party tools: Tableau, SAS et al.
- Standard connectivity: JDBC/ODBC
Build & Setup HAWQ on MacStep 1 Setup HDFSInstall HomeBrew referring to here. Step 1.1 Configure HDFS parameters${HADOOP_HOME}/etc/hadoop/slaves
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/slaves ${HADOOP_HOME}/etc/hadoop/core-site.xml
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> </configuration> ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
For example, /usr/local/Cellar/hadoop/2.8.1/libexec/etc/hadoop/hdfs-site.xml Attention: Replace ${HADOOP_DATA_DIRECTORY} and ${USER_NAME} variables with your own specific values. <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file://${HADOOP_DATA_DIRECTORY}/name</value> <description>Specify your dfs namenode dir path</description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://${HADOOP_DATA_DIRECTORY}/data</value> <description>Specify your dfs datanode dir path</description> </property> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
Step 1.2 Configure HDFS environmenttouch ~/.bashrctouch ~/.bash_profile echo "if [ -f ~/.bashrc ]; thensource ~/.bashrcfi" >> ~/.bash_profile echo "export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.1/libexec" >> ~/.bashrcecho "export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin" >> ~/.bashrc source ~/.bashrc Step 1.3 Setup passphraseless sshssh-keygen -t rsa -P '' -f ~/.ssh/id_rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keyschmod 0600 ~/.ssh/authorized_keys Now you can ssh localhost without a passphrase. If you meet Port 22 connecting refused error, turn on Remote login in your Mac's System Preferences->Sharing . Step 1.4 Format the HDFS filesystemStep 1.5 Start HDFS# start/stop HDFSstart-dfs.sh/stop-dfs.sh # Do some basic tests to make sure HDFS workshdfs dfsadmin -reporthadoop fs -ls / When things go wrong, check the log and view the FAQ in wiki first. Step 2 Setup hawqStep 2.1 System configurationStep 2.1.1 Turn off Rootless System Integrity ProtectionTurning Off Rootless System Integrity Protection on macOS that newer than OS X El Capitan 10.11 if you encounter some tricky LIBRARY_PATH problems, e.g. HAWQ-513, which makes hawq binary not able to find its shared library dependencies. Steps below: - Reboot the Mac and hold down Command + R keys simultaneously after you hear the startup chime, this will boot OS X into Recovery Mode
- When the “OS X Utilities” screen appears, pull down the ‘Utilities’ menu at the top of the screen instead, and choose “Terminal”
- Type the following command into the terminal then hit return: csrutil disable; reboot
Step 2.1.2 Configure sysctl.conf For Mac OSX 10.10 / 10.11, add following content to /etc/sysctl.conf and then sudo sysctl -p to activate them. For Mac OSX 10.12+, add following content to /etc/sysctl.conf and then cat /etc/sysctl.conf | xargs sudo sysctl to check. kern.sysv.shmmax=2147483648kern.sysv.shmmin=1kern.sysv.shmmni=64kern.sysv.shmseg=16kern.sysv.shmall=524288kern.maxfiles=65535kern.maxfilesperproc=65536kern.corefile=/cores/core.%N.%P Step 2.2 Prepare source code and target foldermkdir ~/devgit clone [email protected]:apache/hawq ~/dev/hawqsudo mkdir -p /optsudo chmod a+w /optsudo install -o $USER -d /usr/local/hawq Step 2.3 Setup toolchain and thirdparty dependencySetup toolchain and thirdparty dependency Step 2.4 Build HAWQStep 2.5 Configure HAWQmkdir /tmp/magma_mastermkdir /tmp/magma_segment Feel free to use the default /usr/local/hawq/etc/hawq-site.xml . Pay attention to mapping hawq_dfs_url to fs.defaultFS in ${HADOOP_HOME}/etc/hadoop/core-site.xml . Step 2.6 Init/Start/Stop HAWQ# Before initializing HAWQ, you need to install HDFS and make sure it works. source /usr/local/hawq/greenplum_path.sh # Besides you need to set password-less ssh on the systems.# If only install hawq for developing in localhost, skip this step.# Exchange SSH keys between the hosts host1, host2, and host3:#hawq ssh-exkeys -h host1 -h host2 -h host3# Initialize HAWQ cluster and start HAWQ by defaulthawq init cluster -a# Now you can stop/restart/start the cluster using: hawq stop/restart/start cluster# Init command would invoke start command automaticlly too. # HAWQ master and segments are completely decoupled.# So you can also init, start or stop the master and segments separately.# For example, to init: hawq init master, then hawq init segment# to stop: hawq stop master, then hawq stop segment# to start: hawq start master, then hawq start segment Everytime you init hawq you need to delete some files. The directory of all files you need to delete have been configured in /usr/local/hawq/etc/hawq-site.xml. - Name:
hawq_dfs_url Description:URL for accessing HDFS
- Name:
hawq_master_directory Description:The directory of hawq master
- Name:
hawq_segment_directory Description:The directory of hawq segment
- Name:
hawq_magma_locations_master Description:HAWQ magma service locations on master
- Name:
hawq_magma_locations_segment Description:HAWQ magma service locations on segment
i.e. hdfs dfs -rm -r /hawq*rm -rf /Users/xxx/data/hawq/master/*rm -rf /Users/xxx/data/hawq/segment/*rm -rf /Users/xxx/data/hawq/tmp/magma_master/*rm -rf /Users/xxx/data/hawq/tmp/magma_segment/* Check whether there is any process of postgres or magma running in your computer. If they are running ,you must kill them before you init hawq. For example, ps -ef | grep postgres | grep -v grep | awk '{print $2}'| xargs kill -9ps -ef | grep magma | grep -v grep | awk '{print $2}'| xargs kill -9
Build HAWQ on Centos 7Almost the same as that on macOS, feel free to have a try. Build HAWQ on Centos 7(6.X) using dockerAlmost the same as that on macOS, feel free to have a try. Build & Install & Test (Apache HAWQ Version)
Please see HAWQ wiki page:https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install To make the output is consistent, please create a newdb and use specific locale. TEST_DB_NAME="hawq_feature_test_db"psql -d postgres -c "create database $TEST_DB_NAME;"export PGDATABASE=$TEST_DB_NAMEpsql -c "alter database $TEST_DB_NAME set lc_messages to 'C';"psql -c "alter database $TEST_DB_NAME set lc_monetary to 'C';"psql -c "alter database $TEST_DB_NAME set lc_numeric to 'C';"psql -c "alter database $TEST_DB_NAME set lc_time to 'C';"psql -c "alter database $TEST_DB_NAME set timezone_abbreviations to 'Default';"psql -c "alter database $TEST_DB_NAME set timezone to 'PST8PDT';"psql -c "alter database $TEST_DB_NAME set datestyle to 'postgres,MDY';" To run normal feature test , please use below filter: - Below tests can only run in sequence mode
hawq/src/test/feature/feature-test --gtest_filter=-TestHawqRegister.*:TestTPCH.TestStress:TestHdfsFault.*:TestZookeeperFault.*:TestHawqFault.* - Below tests can run in parallel
cd hawq/src/test/feature/mkdir -p testresultpython ./gtest-parallel --workers=4 --output_dir=./testresult --print_test_times ./feature-test --gtest_filter=-TestHawqRegister.*:TestTPCH.*:TestHdfsFault.*:TestZookeeperFault.*:TestHawqFault.*:TestQuitQuery.*:TestErrorTable.*:TestExternalTableGpfdist.*:TestExternalTableOptionMultibytesDelimiter.TestGpfdist:TETAuth.* TestHawqRegister is not includedTestTPCH.TestStress is for TPCH stress testTestHdfsFault Hdfs fault testsTestZookeeperFault Zookeeper fault tests TestHawqFault Hawq fault tolerance tests Export Control
This distribution includes cryptographic software. The country in which youcurrently reside may have restrictions on the import, possession, use, and/orre-export to another country, of encryption software. BEFORE using anyencryption software, please check your country's laws, regulations andpolicies concerning the import, possession, or use, and re-export of encryptionsoftware, to see if this is permitted. See http://www.wassenaar.org/ for moreinformation. The U.S. Government Department of Commerce, Bureau of Industry and Security(BIS), has classified this software as Export Commodity Control Number (ECCN)5D002.C.1, which includes information security software using or performingcryptographic functions with asymmetric algorithms. The form and manner of thisApache Software Foundation distribution makes it eligible for export under theLicense Exception ENC Technology Software Unrestricted (TSU) exception (see theBIS Export Administration Regulations, Section 740.13) for both object code andsource code. |
请发表评论