使用Python Spark Streaming从http下载数据

Question

我是 PySpark 的新手，我在 Ubuntu 14.04 上安装了 Kafka 单节点和单代理。

安装后我使用kafka-console- Producer和kafka-console-consume测试了Kafka发送和接收数据。

以下是我遵循的步骤启动消费者消费消息。

 bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafkatopic --from-beginning

启动生产者在新的终端窗口中发送消息。

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic kafkatopic
[2016-09-25 7:26:58,179] WARN Property topic is not valid (kafka.utils.VerifiableProperties)
Good morning 
Future big data
this is test message

在消费终端

bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic kafkatopic --from-beginning
Good morning 
Future big data
this is test message

来自 meetup.com 的以下链接可生成流数据

http://stream.meetup.com/2/rsvps

我的需求是如何使用Kafka从http站点收集流数据到spark。下载流数据的转换命令是什么？

下载数据后，我可以找到按城市的计数以及特定时间间隔的其他分析。

Answer 1

处理实时流媒体有不同的方法。我正在考虑的如下所示。

使用Python Spark Streaming从http下载数据

问题描述投票：0回答：1

1个回答

最新问题

使用Python Spark Streaming从http下载数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1