Last active
November 3, 2019 06:10
-
-
Save wantedfast/4d4f39f3a5dd67d4553e51ec76f432e9 to your computer and use it in GitHub Desktop.
ApacheSpark个人学习笔记
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ### Spark的优势 | |
| - HadLoop的MapReduce最大的问题,会产生大量的I/O操作 | |
| - Spark基于内存,速度快于Hadloop 2.x | |
| - Apache Spark™ is a unified analytics engine for large-scale data processing. | |
| - 易用性 | |
| - 通用性 | |
| - 兼用性 ,兼容Hadloop | |
| - 基于MAPREDUCE的计算引擎会将自己的计算结果输出到磁盘上,进行存储和容错 | |
| ### Spark体系架构 | |
| - | |
| ### Spark集群 | |
| StandAlone和伪分布模式(虚拟机) | |
| ### Spark流式数据处理 | |
| Spark并非是真正的流式计算,而是准实时流式计算 | |
| ### Spark的流式计算 | |
| 特点:源源不断,不停的,实时计算 | |
| 代表:Apache Storm ,Spark streaming,JSStrom | |
| ### Apache Storm | |
| 主节点:nimbus,从节点:supervisor | |
| Strom中的任务类型 | |
| 1. Spout采集数据 | |
| 2. Bolt处理数据,blot可以级联,处理完一个blot在处理之后的 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment