[文章作者:张宴 本文版本:v1.1 最后修改:2010.05.18 转载请注明原文链接:http://blog.zyan.cc/infobright/]

  Infobright是一个与MySQL集成的开源数据仓库(Data Warehouse)软件,可作为MySQL的一个存储引擎来使用,SELECT查询与普通MySQL无区别。

  一、Infobright的基本特征:
  优点:
  查询性能高:百万、千万、亿级记录数条件下,同等的SELECT查询语句,速度比MyISAM、InnoDB等普通的MySQL存储引擎快5~60倍
  存储数据量大:TB级数据大小,几十亿条记录
  高压缩比:在我们的项目中为18:1,极大地节省了数据存储空间
  基于列存储:无需建索引,无需分区
  适合复杂的分析性SQL查询:SUM, COUNT, AVG, GROUP BY

  限制:
  不支持数据更新:社区版Infobright只能使用“LOAD DATA INFILE”的方式导入数据,不支持INSERT、UPDATE、DELETE
  不支持高并发:只能支持10多个并发查询



  二、Infobright 安装与基本用法:
  1、下载安装社区版Infobright二进制Linux版本,端口3307
ulimit -SHn 65535
mkdir -p /data0/mysql/3307
/usr/sbin/groupadd mysql
/usr/sbin/useradd -g mysql mysql

cd /usr/local


  ①、64位系统:
wget http://www.infobright.org/downloads/ice/infobright-3.3.1-x86_64-ice.tar.gz
tar zxvf infobright-3.3.1-x86_64-ice.tar.gz
mv infobright-3.3.1-x86_64 infobright


  ②、32位系统:
wget http://www.infobright.org/downloads/ice/infobright-3.3.1-i686-ice.tar.gz
tar zxvf infobright-3.3.1-i686-ice.tar.gz
mv infobright-3.3.1-i686 infobright




cd infobright
./install-infobright.sh --datadir=/data0/mysql/3307/data --cachedir=/data0/mysql/3307/cache --config=/data0/mysql/3307/my.cnf --port=3307 --socket=/tmp/mysql3307.sock --user=mysql --group=mysql




  2、开始安装,提示以下信息:
Infobright installation script is running...
Checking system configuration...
Infobright license agreement...
System tool 'Less' - a text file viewer will be used to display license agreement.
Please only use up/down arrow keys for scrolling license text and press Q when finished reading.
Press R -Read license agreement, N -Exit the installation [R/N]:


  选择R,空格翻页到页尾,看到以下提示时,选择Q继续安装:

                     END OF TERMS AND CONDITIONS

============ Press Q to continue installation ==========
(END)


  接下来会显示以下信息,选择Y同意:
Press Y -I agree, Any other key -I do not agree [Y/*]:

  这时,会提示是否在线注册,选择N不注册:
Installation has been made for system user root and mysql.
Please see README or User guide for instructions related to start/stop the Infobright server and connect to it.
Register your copy of ICE and receive a free copy of the User Manual (a $50 value) as well as a copy of the Bloor Research Spotlight Report "What's Cool About Columns" which explains the differences and benefits of a columnar versus row database.
Registration will require opening an HTTP connection to Infobright, do you wish to register now? [Y/N]:




  3、修改Infobright内存使用限制
vi /data0/mysql/3307/data/brighthouse.ini

  根据自身的物理内存大小修改ServerMainHeapSize、ServerCompressedHeapSize、LoaderMainHeapSize的值,有参考:
############  Critical Memory Settings ############
# System Memory    Server Main Heap Size     Server Compressed Heap Size   Loader Main Heap Size
# 32GB                 24000                      4000                       800
# 16GB                 10000                      1000                       800
#  8GB                  4000                       500                       800
#  4GB                  1300                       400                       400
#  2GB                  600                        250                       320




  4、创建管理MySQL数据库的shell脚本:
vi /data0/mysql/3307/mysql

  输入以下内容(这里的用户名admin和密码12345678接下来的步骤会创建):
#!/bin/sh

mysql_port=3307
mysql_username="admin"
mysql_password="12345678"

function_start_mysql()
{
    printf "Starting MySQL...\n"
    cd /usr/local/infobright/ && /bin/sh ./bin/mysqld_safe --defaults-file=/data0/mysql/${mysql_port}/my.cnf 2>&1 > /dev/null &
}

function_stop_mysql()
{
    printf "Stoping MySQL...\n"
    cd /usr/local/infobright/ && ./bin/mysqladmin -u ${mysql_username} -p${mysql_password} -S /tmp/mysql${mysql_port}.sock shutdown
}

function_restart_mysql()
{
    printf "Restarting MySQL...\n"
    function_stop_mysql
    sleep 5
    function_start_mysql
}

function_kill_mysql()
{
    kill -9 $(ps -ef | grep 'bin/mysqld_safe' | grep ${mysql_port} | awk '{printf $2}')
    kill -9 $(ps -ef | grep 'libexec/mysqld' | grep ${mysql_port} | awk '{printf $2}')
}

if [ "$1" = "start" ]; then
    function_start_mysql
elif [ "$1" = "stop" ]; then
    function_stop_mysql
elif [ "$1" = "restart" ]; then
function_restart_mysql
elif [ "$1" = "kill" ]; then
function_kill_mysql
else
    printf "Usage: /data0/mysql/${mysql_port}/mysql {start|stop|restart|kill}\n"
fi




  5、赋予shell脚本可执行权限:
chmod +x /data0/mysql/3307/mysql




  6、启动MySQL/Infobright:
/data0/mysql/3307/mysql start




  7、通过命令行登录管理MySQL服务器(提示输入密码时直接回车):
/usr/local/infobright/bin/mysql -u root -p -S /tmp/mysql3307.sock




  8、输入以下SQL语句,创建一个具有root权限的用户(admin)和密码(12345678):
GRANT ALL PRIVILEGES ON *.* TO 'admin'@'localhost' IDENTIFIED BY '12345678';
GRANT ALL PRIVILEGES ON *.* TO 'admin'@'127.0.0.1' IDENTIFIED BY '12345678';




  9、示例:从普通的MySQL数据库(假设MySQL安装路径为/usr/local/webserver/mysql)导出数据到csv文件:
/usr/local/webserver/mysql/bin/mysql -S /tmp/mysql3306.sock -D tongji_logs -e "select * from log_visits_2010_05_10 into outfile '/data0/test.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '\"'  ESCAPED BY '\\\' LINES TERMINATED BY '\n';"




  10、示例:普通MySQL和Infobright建表对比
  ①、普通MySQL的InnoDB存储引擎建表:
CREATE TABLE IF NOT EXISTS `log_visits_2010_05_12` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `cate_id` int(11) NOT NULL,
  `site_id` int(11) unsigned NOT NULL,
  `visitor_localtime` char(8) NOT NULL,
  `visitor_idcookie` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `cate_site_id` (`cate_id`,`site_id`),
  KEY `visitor_localtime` (`visitor_localtime`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


  ②、Infobright的BRIGHTHOUSE存储引擎建表:
CREATE TABLE IF NOT EXISTS `log_visits` (
  `id` int(11) NOT NULL,
  `cate_id` int(11) NOT NULL,
  `site_id` int(11) NOT NULL,
  `visitor_localtime` char(8) NOT NULL,
  `visitor_idcookie` varchar(255) NOT NULL,
) ENGINE=BRIGHTHOUSE DEFAULT CHARSET=utf8;

  注:BRIGHTHOUSE存储引擎建表时不能有AUTO_INCREMENT自增、unsigned无符号、unique唯一、主键PRIMARY KEY、索引KEY。



  11、示例:从csv文件导入数据到Infobright数据仓库:
/usr/local/infobright/bin/mysql -S /tmp/mysql3307.sock -D dw --skip-column-names -e "LOAD DATA INFILE '/data0/test.csv' INTO TABLE log_visits_2010_04_13 FIELDS TERMINATED BY ',' ESCAPED BY '\\\' LINES TERMINATED BY '\n';"




  12、示例:普通MySQL和Infobright查询速度对比(共220多万条记录):
  ①、普通MySQL的InnoDB存储引擎(已建索引):
mysql> SELECT config_browser_name, count(*) AS total FROM `browser_info` GROUP BY config_browser_name order by total DESC;
+---------------------+---------+
| config_browser_name | total   |
+---------------------+---------+
| IE                  | 2204016 |
| CH                  |   20650 |
| FF                  |   10475 |
| MO                  |    6147 |
| OT                  |    1631 |
| OP                  |    1282 |
| SF                  |     797 |
| KM                  |       5 |
| KO                  |       2 |
+---------------------+---------+
9 rows in set (1 min 28.13 sec)


  ②、Infobright的BRIGHTHOUSE存储引擎:
mysql> SELECT config_browser_name, count(*) AS total FROM `browser_info` GROUP BY config_browser_name order by total DESC;
+---------------------+---------+
| config_browser_name | total   |
+---------------------+---------+
| IE                  | 2204016 |
| CH                  |   20650 |
| FF                  |   10475 |
| MO                  |    6147 |
| OT                  |    1631 |
| OP                  |    1282 |
| SF                  |     797 |
| KM                  |       5 |
| KO                  |       2 |
+---------------------+---------+
9 rows in set (0.84 sec)




  13、(可选)停止MySQL/Infobright:
/data0/mysql/3307/mysql stop






技术大类 » 数据库技术 | 评论(62) | 引用(0) | 阅读(118604)
老崔 Email Homepage
2010-5-17 14:16
Infobright 存储的数据主要是用来做决策分析和统计的吧。
关于Infobright 数据仓库的技术,大家可以看看:http://tech.techweb.com.cn/thread-389359-1-1.html
逍遥网用它来做什么呢?
ithouge Email Homepage
2010-5-17 14:18
学习!
willko Email Homepage
2010-5-17 14:23
用户行为分析。
怪物宝
2010-5-17 14:25
哈哈哈,太好了
中国人
2010-5-17 15:00
此类数据使用nosql再适合不过了,效率上比mysql Infobright要高。
张宴 回复于 2010-5-17 20:47
NoSQL适合简单的Key-value,不适合复杂的分析型计算。
mydjango Email Homepage
2010-5-17 15:51
哈哈  学习了~
mydjango Email Homepage
2010-5-17 15:51
学习了~
苦咖啡 Homepage
2010-5-17 15:52
貌似b2b网站最适合使用
ElmerZhang Email Homepage
2010-5-17 17:38
收藏了
解梦 Homepage
2010-5-17 20:58
经常来学习一下
Rebill Email Homepage
2010-5-17 23:47
unsigned唯一?不是unsigned无符号、unique唯一吗?呵呵,献丑了。不支持INSERT,已经相当麻烦了。
张宴 回复于 2010-5-18 08:17
不好意思,写错了,已修正。
xiaoyu
2010-5-18 13:23
gringrin
“LOAD DATA INFILE”
刚好和我要做的项目符合。
Mark
2010-5-18 15:17
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql3307.sock' (2)
怎么弄
阿飞 Homepage
2010-5-20 22:20
学习来了
passkey
2010-5-21 11:07
做数据分析大有用处了
新乡家教 Homepage
2010-5-27 18:02
来学习啦
不过小虾没有看明白!
xiaoyu
2010-5-31 00:42
/usr/local/infobright/bin/mysql -S /tmp/mysql3307.sock -D dw --skip-column-names -e "LOAD DATA INFILE '/data0/test.csv' INTO TABLE log_visits_2010_04_13 FIELDS TERMINATED BY ',' ESCAPED BY '\\\' LINES TERMINATED BY '\n';"

ESCAPED BY '\\\'

这里不会报错吗?
lanhun
2010-6-3 10:11
这个能不用3307,合并到3306里面么
qqqq
2010-6-11 11:05
西岭风清 Email Homepage
2010-6-11 14:08
我看到 Infobright 官方介绍,支持的并发用户才 10 - 18个啊;
这玩意如何能用,还不如自己设计一套数据库呢。
张宴 回复于 2010-7-9 21:18
数据仓库本身就是用来做数据挖掘的,并不强调并发。用数据仓库去做并发访问应用,犹如用高射炮打蚊子。
分页: 1/4 第一页 1 2 3 4 下页 最后页
发表评论
表情
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
emotemotemotemotemot
打开HTML
打开UBB
打开表情
隐藏
记住我
昵称   密码   游客无需密码
网址   电邮   [注册]