HDFS Architecture Guide

HDFS(Hadoop Distributed File System): distributed file system to run on commodity hardware.

1. Assumptions & Goals

- In case of some server failures, data can still be accessed.

- Should be able to work with various Hadoop Ecosystem Applications, some having streaming features

- High Throughput of data

- Can read/write large datasets, in TB

2. Blocks

- A file replicated by replication factors and located across different datanodes.

- A File is split into one or more "blocks". (File == sequence of blocks)

- Default size of a block: 64MB

3. NameNode, DataNode

- NameNode

- Masterserver

- Manages filesystem namespace and (replication/mapping/metadata) info about blocks

- Communicates with DataNode to check its state/availability

- SPOF(Single Point Of Failure): If NameNode fails, the whole system fails.

- DataNode

- Manage data(block)

- Rack: Cluster of "physically close" DataNodes. (20~30 nodes)

- Common replication policy: if replication factor is three, two are located in a same rack, one is located in external rack.

- Effect: High Network Bandwidth, Low Latency

4. Typical Failures

- NameNode failure

- DataNode failure

- Network Partitions: ???

[Hive] ALTER TABLE PARTITION SET LOCATION 이 안되는 현상 (0)	2020.03.02
hadoop distcp에서 queue 지정하기 (0)	2020.02.12

스펑키의 개발 블로그