R語言利劍之NoSQL系列:MongoDB
由于文章篇幅有限,均跳過NoSQL的安裝過程,請自行參考文檔安裝。
***篇 R利劍MongeDB,分為4個章節(jié)。
MongoDB環(huán)境準備
rmongodb函數(shù)庫
rmongodb基本使用操作
rmongodb測試案例
每一章節(jié),都會分為”文字說明部分”和”代碼部分”,保持文字說明與代碼的連貫性。
1. MongoDB環(huán)境準備
文字說明部分:
首先環(huán)境準備,這里我選擇了Linux Ubuntu操作系統(tǒng)12.04的32位桌面版本,大家可以根據(jù)自己的使用習慣選擇順手的Linux。
MongoDB安裝過程跳過。
查看MongoDB服務器環(huán)境
使用mongod命令,啟動MongoDB。
進程號:pid=2924
端口:port=27017
數(shù)據(jù)文件目錄:dbpath=/data/db/
軟件版本:32-bit
主機名:host=conan
使用mongo命令,打開mongo shell。
mongo shell的簡單操作:
查看數(shù)據(jù)庫,切換數(shù)據(jù)庫,查看數(shù)據(jù)集.
R語言環(huán)境2.15.0,WinXP通過遠程連接,訪問Mongodb Server。
代碼部分:
查看操作系統(tǒng)
~ uname -a
Linux conan 3.2.0-38-generic-pae #61-Ubuntu SMP Tue Feb 19 12:39:51 UTC 2013 i686 i686 i386 GNU/Linux
~ cat /etc/issue
Ubuntu 12.04.2 LTS \n \l
啟動mongodb
~ mongod
mongod --help for help and startup options
Thu Apr 11 11:02:26
Thu Apr 11 11:02:26 warning: 32-bit servers don't have journaling enabled by default. Please use --journal if you want durability.
Thu Apr 11 11:02:26
Thu Apr 11 11:02:26 [initandlisten] MongoDB starting : pid=2924 port=27017 dbpath=/data/db/ 32-bit host=conan
Thu Apr 11 11:02:26 [initandlisten]
Thu Apr 11 11:02:26 [initandlisten] ** NOTE: when using MongoDB 32 bit, you are limited to about 2 gigabytes of data
Thu Apr 11 11:02:26 [initandlisten] ** see http://blog.mongodb.org/post/137788967/32-bit-limitations
Thu Apr 11 11:02:26 [initandlisten] ** with --journal, the limit is lower
Thu Apr 11 11:02:26 [initandlisten]
Thu Apr 11 11:02:26 [initandlisten] db version v2.0.6, pdfile version 4.5
Thu Apr 11 11:02:26 [initandlisten] git version: e1c0cbc25863f6356aa4e31375add7bb49fb05bc
Thu Apr 11 11:02:26 [initandlisten] build info: Linux domU-12-31-39-01-70-B4 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:39:36 EST 2008 i686 BOOST_LIB_VERSION=1_41
Thu Apr 11 11:02:26 [initandlisten] options: {}
Thu Apr 11 11:02:26 [websvr] admin web console waiting for connections on port 28017
Thu Apr 11 11:02:26 [initandlisten] waiting for connections on port 27017
打開mongo shell
~ mongo
MongoDB shell version: 2.0.6
connecting to: test
進入mongo shell, 列表顯示數(shù)據(jù)庫
> show dbs
db 0.0625GB
feed 0.0625GB
foobar 0.0625GB
local (empty)
切換數(shù)據(jù)庫
> use foobar
switched to db foobar
列表顯示數(shù)據(jù)集
> show collections
blog
system.indexes
R語言開發(fā)環(huán)境2.15.0,WinXP
~ R
R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: i386-pc-mingw32/i386 (32-bit)
2. rmongodb函數(shù)庫
文字說明部分:
rmongodb的開發(fā)了一大堆的函數(shù),對應mongo的操作。比起別的NoSQL來說,真是工程浩大啊。但我總覺得封裝粒度不夠,寫起代碼來比較復雜。
下面列出了所有rmongodb函數(shù)庫,我只挑選幾個常用的介紹。
建立mongo連接
mongo<-mongo.create()
查看接連是否正常
mongo.is.connected(mongo)
創(chuàng)建一個BSON對象緩存
buf <- mongo.bson.buffer.create()
給對象buf增加element
mongo.bson.buffer.append(buf, "name", "Echo")
增加對象類型的element
score <- c(5, 3.5, 4)
names(score) <- c("Mike", "Jimmy", "Ann")
mongo.bson.buffer.append(buf, "score", score)
增加數(shù)組類型的element
mongo.bson.buffer.start.array(buf, "comments")
mongo.bson.buffer.append(buf, "0", "a1")
mongo.bson.buffer.append(buf, "1", "a2")
mongo.bson.buffer.append(buf, "2", "a3")
關閉數(shù)組類型的element
mongo.bson.buffer.finish.object(buf)
取出緩存數(shù)據(jù)
b <- mongo.bson.from.buffer(buf)
數(shù)據(jù)庫.數(shù)據(jù)集
ns="db.blog"
插入一條記錄
mongo.insert(mongo,ns,b)
#mongo shell:(Not Run)
db.blog.insert(b)
創(chuàng)建查詢對象query
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", "Echo")
query <- mongo.bson.from.buffer(buf)
創(chuàng)建查詢返回值對象
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", 1)
fields <- mongo.bson.from.buffer(buf)
執(zhí)行單條記錄查詢
mongo.find.one(mongo, ns, query, fields)
#mongo shell:(Not Run)
db.blog.findOne({query},{fields})
執(zhí)行列表記錄查詢
mongo.find(mongo, ns, query, fields)
#mongo shell:(Not Run)
db.blog.find({query},{fields})
創(chuàng)建修改器對象objNew
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$inc")
mongo.bson.buffer.append(buf, "age", 1L)
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
執(zhí)行修改操作
mongo.update(mongo, ns, query, objNew)
#mongo shell:(Not Run)
db.blog.update({query},{objNew})
單行代碼修改操作
mongo.update(mongo, ns, query, list(name="Echo", age=25))
#mongo shell:(Not Run)
db.blog.update({query},{objNew})
刪除所選對象
mongo.remove(mongo, ns, query)
#mongo shell:(Not Run)
db.blog.remove({query},{objNew})
銷毀mongo連接
mongo.destroy(mongo)
代碼部分:
共有153個函數(shù)
mongo.add.user
mongo.authenticate
mongo.binary.binary
mongo.binary.function
mongo.binary.md5
mongo.binary.old
mongo.binary.user
mongo.binary.uuid
mongo.bson.array
mongo.bson.binary
mongo.bson.bool
mongo.bson.buffer.append
mongo.bson.buffer.append.bool
mongo.bson.buffer.append.bson
mongo.bson.buffer.append.code
mongo.bson.buffer.append.code.w.scope
mongo.bson.buffer.append.complex
mongo.bson.buffer.append.double
mongo.bson.buffer.append.element
mongo.bson.buffer.append.int
mongo.bson.buffer.append.list
mongo.bson.buffer.append.long
mongo.bson.buffer.append.null
mongo.bson.buffer.append.object
mongo.bson.buffer.append.oid
mongo.bson.buffer.append.raw
mongo.bson.buffer.append.regex
mongo.bson.buffer.append.string
mongo.bson.buffer.append.symbol
mongo.bson.buffer.append.time
mongo.bson.buffer.append.timestamp
mongo.bson.buffer.append.undefined
mongo.bson.buffer.create
mongo.bson.buffer.finish.object
mongo.bson.buffer.size
mongo.bson.buffer.start.array
mongo.bson.buffer.start.object
mongo.bson.code
mongo.bson.code.w.scope
mongo.bson.date
mongo.bson.dbref
mongo.bson.destroy
mongo.bson.double
mongo.bson.empty
mongo.bson.eoo
mongo.bson.find
mongo.bson.from.buffer
mongo.bson.from.list
mongo.bson.int
mongo.bson.iterator.create
mongo.bson.iterator.key
mongo.bson.iterator.next
mongo.bson.iterator.type
mongo.bson.iterator.value
mongo.bson.long
mongo.bson.null
mongo.bson.object
mongo.bson.oid
mongo.bson.print
mongo.bson.regex
mongo.bson.size
mongo.bson.string
mongo.bson.symbol
mongo.bson.timestamp
mongo.bson.to.list
mongo.bson.undefined
mongo.bson.value
mongo.code.create
mongo.code.w.scope.create
mongo.command
mongo.count
mongo.create
mongo.cursor.destroy
mongo.cursor.next
mongo.cursor.value
mongo.destroy
mongo.disconnect
mongo.distinct
mongo.drop
mongo.drop.database
mongo.find
mongo.find.await.data
mongo.find.cursor.tailable
mongo.find.exhaust
mongo.find.no.cursor.timeout
mongo.find.one
mongo.find.oplog.replay
mongo.find.partial.results
mongo.find.slave.ok
mongo.get.database.collections
mongo.get.databases
mongo.get.err
mongo.get.hosts
mongo.get.last.err
mongo.get.prev.err
mongo.get.primary
mongo.get.server.err
mongo.get.server.err.string
mongo.get.socket
mongo.get.timeout
mongo.gridfile.destroy
mongo.gridfile.get.chunk
mongo.gridfile.get.chunk.count
mongo.gridfile.get.chunks
mongo.gridfile.get.chunk.size
mongo.gridfile.get.content.type
mongo.gridfile.get.descriptor
mongo.gridfile.get.filename
mongo.gridfile.get.length
mongo.gridfile.get.md5
mongo.gridfile.get.metadata
mongo.gridfile.get.upload.date
mongo.gridfile.pipe
mongo.gridfile.read
mongo.gridfile.seek
mongo.gridfile.writer.create
mongo.gridfile.writer.finish
mongo.gridfile.writer.write
mongo.gridfs.create
mongo.gridfs.destroy
mongo.gridfs.find
mongo.gridfs.remove.file
mongo.gridfs.store
mongo.gridfs.store.file
mongo.index.background
mongo.index.create
mongo.index.drop.dups
mongo.index.sparse
mongo.index.unique
mongo.insert
mongo.insert.batch
mongo.is.connected
mongo.is.master
mongo.oid.create
mongo.oid.from.string
mongo.oid.print
mongo.oid.time
mongo.oid.to.string
mongo.reconnect
mongo.regex.create
mongo.remove
mongo.rename
mongo.reset.err
mongo.set.timeout
mongo.shorthand
mongo.simple.command
mongo.symbol.create
mongo.timestamp.create
mongo.undefined.create
mongo.update
mongo.update.basic
mongo.update.multi
mongo.update.upsert
3. rmongodb基本使用操作
文字說明部分:
首先,要安裝rmongodb類庫,加載類庫。
然后,通過mongo.create()函數(shù),建立與MongoDB Server的連接。如果是本地連接,mongo.create()不要參數(shù),下面例子使用遠程連接,增加host參數(shù)配置IP地址。mongo<-mongo.create(host=“192.168.1.11”)
檢查是否連接正常,mongo.is.connected()。這條語句在開發(fā)時會經(jīng)常使用到。在用R語言建模時,如果對象或者函數(shù)使用錯誤,連接會被自動斷開。由于MongoDB的異常機制,斷開時不會是提示。大家要手動使用這條命令測試,連接是否正常。
接下來,定義兩個變量,db和ns。db是我們需要使用的數(shù)據(jù)庫,ns是數(shù)據(jù)庫+數(shù)據(jù)集。
下面我們創(chuàng)建一個Mongo對象。
{
"_id" : ObjectId("51663e14da2c51b1e8bc62eb"),
"name" : "Echo",
"age" : 22,
"gender" : "Male",
"score" : {
"Mike" : 5,
"Jimmy" : 3.5,
"Ann" : 4
},
"comments" : [
"a1",
"a2",
"a3"
]
}
然后,分別使用修改器
***刪除對象,并斷開連接。
代碼部分:
安裝rmongodb
install.packages(rmongodb)
加載類庫
library(rmongodb)
遠程連接mongodb server
mongo<-mongo.create(host="192.168.1.11")
查看是否連接正常
print(mongo.is.connected(mongo))
定義db
db<-"foobar"
定義db.collection
ns<-"foobar.blog"
組織bson類型數(shù)據(jù)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", "Echo")
mongo.bson.buffer.append(buf, "age", 22L)
mongo.bson.buffer.append(buf, "gender", 'Male')
#對象類型
score <- c(5, 3.5, 4)
names(score) <- c("Mike", "Jimmy", "Ann")
mongo.bson.buffer.append(buf, "score", score)
#數(shù)組類型
mongo.bson.buffer.start.array(buf, "comments")
mongo.bson.buffer.append(buf, "0", "a1")
mongo.bson.buffer.append(buf, "1", "a2")
mongo.bson.buffer.append(buf, "2", "a3")
mongo.bson.buffer.finish.object(buf)
b <- mongo.bson.from.buffer(buf)
插入mongodb
mongo.insert(mongo,ns,b)
單條顯示插入的數(shù)據(jù)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", "Echo")
query <- mongo.bson.from.buffer(buf)
print(mongo.find.one(mongo, ns, query))
使用$inc修改器,修改給age加1
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$inc")
mongo.bson.buffer.append(buf, "age", 1L)
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, query, objNew)
print(mongo.find.one(mongo, ns, query))
使用$set修改器,修改age=1
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$set")
mongo.bson.buffer.append(buf, "age", 1L)
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, query, objNew)
print(mongo.find.one(mongo, ns, query))
使用$push修改器,給comments數(shù)組追加”Orange”數(shù)據(jù)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$push")
mongo.bson.buffer.append(buf, "comments", "Orange")
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, query, objNew)
print(mongo.find.one(mongo, ns, query))
使用簡化修改語句,給對象重新賦值
mongo.update(mongo, ns, query, list(name="Echo", age=25))
print(mongo.find.one(mongo, ns, query))
刪除對象
mongo.remove(mongo, ns, query)
銷毀mongo連接
mongo.destroy(mongo)
4. rmongodb測試案例
文字說明部分:
批量插入數(shù)據(jù),使用修改器批量修改數(shù)據(jù)
3種修改器速度比較,
終于push是對數(shù)組操作,set是對任意值操作,inc是對數(shù)字操作,所以下面測試可能不太公平。測試結果僅供參考。
代碼部分:
批量插入數(shù)據(jù)函數(shù)
batch_insert<-function(arr=1:10,ns){
library(stringr)
mongo_insert<-function(x){
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", str_c("Dave",x))
mongo.bson.buffer.append(buf, "age", x)
mongo.bson.buffer.start.array(buf, "comments")
mongo.bson.buffer.append(buf, "0", "a1")
mongo.bson.buffer.append(buf, "1", "a2")
mongo.bson.buffer.append(buf, "2", "a3")
mongo.bson.buffer.finish.object(buf)
return(mongo.bson.from.buffer(buf))
}
mongo.insert.batch(mongo, ns, lapply(arr,mongo_insert))
}
批量修改,$inc修改器函數(shù)
batch_inc<-function(data,ns){
for(i in data){
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", str_c("Dave",i))
criteria <- mongo.bson.from.buffer(buf)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$inc")
mongo.bson.buffer.append(buf, "age", 1L)
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, criteria, objNew)
}
}
批量修改,$set修改器函數(shù)
batch_set<-function(data,ns){
for(i in data){
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", str_c("Dave",i))
criteria <- mongo.bson.from.buffer(buf)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$set")
mongo.bson.buffer.append(buf, "age", 1L)
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, criteria, objNew)
}
}
批量修改,$push修改器函數(shù)
batch_push<-function(data,ns){
for(i in data){
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.append(buf, "name", str_c("Dave",i))
criteria <- mongo.bson.from.buffer(buf)
buf <- mongo.bson.buffer.create()
mongo.bson.buffer.start.object(buf, "$push")
mongo.bson.buffer.append(buf, "comments", "Orange")
mongo.bson.buffer.finish.object(buf)
objNew <- mongo.bson.from.buffer(buf)
mongo.update(mongo, ns, criteria, objNew)
}
}
執(zhí)行程序,3種修改速度比較,$push最慢
ns="foobar.blog" data=1:1000 mongo.remove(mongo, ns) ## [1] TRUE system.time(batch_insert(data, ns)) ## user system elapsed ## 0.25 0.00 0.28 system.time(batch_inc(data, ns)) ## user system elapsed ## 0.47 0.27 2.50 system.time(batch_set(data, ns)) ## user system elapsed ## 0.77 0.48 3.17 system.time(batch_push(data, ns)) ## user system elapsed ## 0.81 0.41 4.23