项目作者: cclient

项目描述 :
hadoop-on-mongo demo 迁移至 spark-on-hadoop-mongo 再迁移至 mongo-spark-connector
高级语言: Java
项目地址: git://github.com/cclient/spark-java-mongo-demo.git
创建时间: 2015-06-10T08:39:42Z
项目社区:https://github.com/cclient/spark-java-mongo-demo

开源协议:

下载


spark-java-mongo-demo

spark
通过 mongo-hadoop
分析 mongodb数据
java写成

mongodb 原始数据
{
“_id” : ObjectId(“54d83f3548c9bc218e056ce6”),
“apMac” : “aa:bb:cc:dd:ee:ff”,
“proto” : “http”,
“url” : “extshort.weixin.qq.com”,
“clientMac” : “ff:ee:dd:cc:bb:aa”
}

输出结果

mvn clean scala:compile compile package

执行方式
spark-submit —class “sparkfisrttest.cdpspark.App” —packages org.mongodb.mongo-hadoop:mongo-hadoop-core:1.3.1,org.mongodb:mongodb-driver:3.0.1,org.mongodb:mongo-java-driver:3.0.1 ~/hadoop-spark-mongo-examples.jar

demo早期基于hadoop 的 mongo driver

mongo-hadoop-core

近期发现有spark官方的connector

https://docs.mongodb.com/spark-connector/current/

便加了基于mongo-spark-connector的rdd示例(dataset和sql尚不熟悉)