• 注册
当前位置:1313e > 默认分类 >正文

聊聊flink的MemoryStateBackend

本文主要研究一下flink的MemoryStateBackend

StateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/StateBackend.java

@PublicEvolving
public interface StateBackend extends java.io.Serializable {// ------------------------------------------------------------------------//  Checkpoint storage - the durable persistence of checkpoint data// ------------------------------------------------------------------------/*** Resolves the given pointer to a checkpoint/savepoint into a checkpoint location. The location* supports reading the checkpoint metadata, or disposing the checkpoint storage location.** 

If the state backend cannot understand the format of the pointer (for example because it* was created by a different state backend) this method should throw an {@code IOException}.** @param externalPointer The external checkpoint pointer to resolve.* @return The checkpoint location handle.** @throws IOException Thrown, if the state backend does not understand the pointer, or if* the pointer could not be resolved due to an I/O error.*/CompletedCheckpointStorageLocation resolveCheckpoint(String externalPointer) throws IOException;/*** Creates a storage for checkpoints for the given job. The checkpoint storage is* used to write checkpoint data and metadata.** @param jobId The job to store checkpoint data for.* @return A checkpoint storage for the given job.** @throws IOException Thrown if the checkpoint storage cannot be initialized.*/CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException;// ------------------------------------------------------------------------// Structure Backends // ------------------------------------------------------------------------/*** Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding keyed state* and checkpointing it. Uses default TTL time provider.**

Keyed State is state where each value is bound to a key.** @param The type of the keys by which the state is organized.** @return The Keyed State Backend for the given job, operator, and key group range.** @throws Exception This method may forward all exceptions that occur while instantiating the backend.*/default AbstractKeyedStateBackend createKeyedStateBackend(Environment env,JobID jobID,String operatorIdentifier,TypeSerializer keySerializer,int numberOfKeyGroups,KeyGroupRange keyGroupRange,TaskKvStateRegistry kvStateRegistry) throws Exception {return createKeyedStateBackend(env,jobID,operatorIdentifier,keySerializer,numberOfKeyGroups,keyGroupRange,kvStateRegistry,TtlTimeProvider.DEFAULT);}/*** Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding keyed state* and checkpointing it.**

Keyed State is state where each value is bound to a key.** @param The type of the keys by which the state is organized.** @return The Keyed State Backend for the given job, operator, and key group range.** @throws Exception This method may forward all exceptions that occur while instantiating the backend.*/default AbstractKeyedStateBackend createKeyedStateBackend(Environment env,JobID jobID,String operatorIdentifier,TypeSerializer keySerializer,int numberOfKeyGroups,KeyGroupRange keyGroupRange,TaskKvStateRegistry kvStateRegistry,TtlTimeProvider ttlTimeProvider) throws Exception {return createKeyedStateBackend(env,jobID,operatorIdentifier,keySerializer,numberOfKeyGroups,keyGroupRange,kvStateRegistry,ttlTimeProvider,new UnregisteredMetricsGroup());}/*** Creates a new {@link AbstractKeyedStateBackend} that is responsible for holding keyed state* and checkpointing it.**

Keyed State is state where each value is bound to a key.** @param The type of the keys by which the state is organized.** @return The Keyed State Backend for the given job, operator, and key group range.** @throws Exception This method may forward all exceptions that occur while instantiating the backend.*/ AbstractKeyedStateBackend createKeyedStateBackend(Environment env,JobID jobID,String operatorIdentifier,TypeSerializer keySerializer,int numberOfKeyGroups,KeyGroupRange keyGroupRange,TaskKvStateRegistry kvStateRegistry,TtlTimeProvider ttlTimeProvider,MetricGroup metricGroup) throws Exception;/*** Creates a new {@link OperatorStateBackend} that can be used for storing operator state.**

Operator state is state that is associated with parallel operator (or function) instances,* rather than with keys.** @param env The runtime environment of the executing task.* @param operatorIdentifier The identifier of the operator whose state should be stored.** @return The OperatorStateBackend for operator identified by the job and operator identifier.** @throws Exception This method may forward all exceptions that occur while instantiating the backend.*/OperatorStateBackend createOperatorStateBackend(Environment env, String operatorIdentifier) throws Exception; } 复制代码

  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed
  • flink目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • MemoryStateBackend可以在构造器中指定大小,默认是5MB,可以增大但是不能超过akka frame size;FsStateBackend模式把TaskManager的state存储在内存,但是可以把checkpoint的state存储到filesystem中(比如HDFS);RocksDBStateBackend把working state存储在RocksDB中,checkpoint的state存储在filesystem
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的
  • StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend

AbstractStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/AbstractStateBackend.java

/*** An abstract base implementation of the {@link StateBackend} interface.** 

This class has currently no contents and only kept to not break the prior class hierarchy for users.*/ @PublicEvolving public abstract class AbstractStateBackend implements StateBackend, java.io.Serializable {private static final long serialVersionUID = 4620415814639230247L;// ------------------------------------------------------------------------// State Backend - State-Holding Backends// ------------------------------------------------------------------------@Overridepublic abstract AbstractKeyedStateBackend createKeyedStateBackend(Environment env,JobID jobID,String operatorIdentifier,TypeSerializer keySerializer,int numberOfKeyGroups,KeyGroupRange keyGroupRange,TaskKvStateRegistry kvStateRegistry,TtlTimeProvider ttlTimeProvider,MetricGroup metricGroup) throws IOException;@Overridepublic abstract OperatorStateBackend createOperatorStateBackend(Environment env,String operatorIdentifier) throws Exception; } 复制代码

  • AbstractStateBackend声明实现StateBackend及Serializable接口,这里没有新增其他内容

AbstractFileStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/filesystem/AbstractFileStateBackend.java

@PublicEvolving
public abstract class AbstractFileStateBackend extends AbstractStateBackend {private static final long serialVersionUID = 1L;// ------------------------------------------------------------------------//  State Backend Properties// ------------------------------------------------------------------------/** The path where checkpoints will be stored, or null, if none has been configured. */@Nullableprivate final Path baseCheckpointPath;/** The path where savepoints will be stored, or null, if none has been configured. */@Nullableprivate final Path baseSavepointPath;//......// ------------------------------------------------------------------------//  Initialization and metadata storage// ------------------------------------------------------------------------@Overridepublic CompletedCheckpointStorageLocation resolveCheckpoint(String pointer) throws IOException {return AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer);}// ------------------------------------------------------------------------//  Utilities// ------------------------------------------------------------------------/*** Checks the validity of the path's scheme and path.** @param path The path to check.* @return The URI as a Path.** @throws IllegalArgumentException Thrown, if the URI misses scheme or path.*/private static Path validatePath(Path path) {final URI uri = path.toUri();final String scheme = uri.getScheme();final String pathPart = uri.getPath();// some validity checksif (scheme == null) {throw new IllegalArgumentException("The scheme (hdfs://, file://, etc) is null. " +"Please specify the file system scheme explicitly in the URI.");}if (pathPart == null) {throw new IllegalArgumentException("The path to store the checkpoint data in is null. " +"Please specify a directory path for the checkpoint data.");}if (pathPart.length() == 0 || pathPart.equals("/")) {throw new IllegalArgumentException("Cannot use the root directory for checkpoints.");}return path;}@Nullableprivate static Path parameterOrConfigured(@Nullable Path path, Configuration config, ConfigOption option) {if (path != null) {return path;}else {String configValue = config.getString(option);try {return configValue == null ? null : new Path(configValue);}catch (IllegalArgumentException e) {throw new IllegalConfigurationException("Cannot parse value for " + option.key() +" : " + configValue + " . Not a valid path.");}}}
}
复制代码
  • AbstractFileStateBackend继承了AbstractStateBackend,它有baseCheckpointPath、baseSavepointPath两个属性,允许为null,路径的格式为hdfs://或者file://开头;resolveCheckpoint方法用于解析checkpoint或savepoint的location,这里使用AbstractFsCheckpointStorage.resolveCheckpointPointer(pointer)来完成

MemoryStateBackend

flink-runtime_2.11-1.7.0-sources.jar!/org/apache/flink/runtime/state/memory/MemoryStateBackend.java

@PublicEvolving
public class MemoryStateBackend extends AbstractFileStateBackend implements ConfigurableStateBackend {private static final long serialVersionUID = 4109305377809414635L;/** The default maximal size that the snapshotted memory state may have (5 MiBytes). */public static final int DEFAULT_MAX_STATE_SIZE = 5 * 1024 * 1024;/** The maximal size that the snapshotted memory state may have. */private final int maxStateSize;/** Switch to chose between synchronous and asynchronous snapshots.* A value of 'UNDEFINED' means not yet configured, in which case the default will be used. */private final TernaryBoolean asynchronousSnapshots;// ------------------------------------------------------------------------/*** Creates a new memory state backend that accepts states whose serialized forms are* up to the default state size (5 MB).** 

Checkpoint and default savepoint locations are used as specified in the* runtime configuration.*/public MemoryStateBackend() {this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED);}/*** Creates a new memory state backend that accepts states whose serialized forms are* up to the default state size (5 MB). The state backend uses asynchronous snapshots* or synchronous snapshots as configured.**

Checkpoint and default savepoint locations are used as specified in the* runtime configuration.** @param asynchronousSnapshots Switch to enable asynchronous snapshots.*/public MemoryStateBackend(boolean asynchronousSnapshots) {this(null, null, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.fromBoolean(asynchronousSnapshots));}/*** Creates a new memory state backend that accepts states whose serialized forms are* up to the given number of bytes.**

Checkpoint and default savepoint locations are used as specified in the* runtime configuration.**

WARNING: Increasing the size of this value beyond the default value* ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.* The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there* and the JobManager needs to be able to hold all aggregated state in its memory.** @param maxStateSize The maximal size of the serialized state*/public MemoryStateBackend(int maxStateSize) {this(null, null, maxStateSize, TernaryBoolean.UNDEFINED);}/*** Creates a new memory state backend that accepts states whose serialized forms are* up to the given number of bytes and that uses asynchronous snashots as configured.**

Checkpoint and default savepoint locations are used as specified in the* runtime configuration.**

WARNING: Increasing the size of this value beyond the default value* ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.* The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there* and the JobManager needs to be able to hold all aggregated state in its memory.** @param maxStateSize The maximal size of the serialized state* @param asynchronousSnapshots Switch to enable asynchronous snapshots.*/public MemoryStateBackend(int maxStateSize, boolean asynchronousSnapshots) {this(null, null, maxStateSize, TernaryBoolean.fromBoolean(asynchronousSnapshots));}/*** Creates a new MemoryStateBackend, setting optionally the path to persist checkpoint metadata* to, and to persist savepoints to.** @param checkpointPath The path to write checkpoint metadata to. If null, the value from* the runtime configuration will be used.* @param savepointPath The path to write savepoints to. If null, the value from* the runtime configuration will be used.*/public MemoryStateBackend(@Nullable String checkpointPath, @Nullable String savepointPath) {this(checkpointPath, savepointPath, DEFAULT_MAX_STATE_SIZE, TernaryBoolean.UNDEFINED);}/*** Creates a new MemoryStateBackend, setting optionally the paths to persist checkpoint metadata* and savepoints to, as well as configuring state thresholds and asynchronous operations.**

WARNING: Increasing the size of this value beyond the default value* ({@value #DEFAULT_MAX_STATE_SIZE}) should be done with care.* The checkpointed state needs to be send to the JobManager via limited size RPC messages, and there* and the JobManager needs to be able to hold all aggregated state in its memory.** @param checkpointPath The path to write checkpoint metadata to. If null, the value from* the runtime configuration will be used.* @param savepointPath The path to write savepoints to. If null, the value from* the runtime configuration will be used.* @param maxStateSize The maximal size of the serialized state.* @param asynchronousSnapshots Flag to switch between synchronous and asynchronous* snapshot mode. If null, the value configured in the* runtime configuration will be used.*/public MemoryStateBackend(@Nullable String checkpointPath,@Nullable String savepointPath,int maxStateSize,TernaryBoolean asynchronousSnapshots) {super(checkpointPath == null ? null : new Path(checkpointPath),savepointPath == null ? null : new Path(savepointPath));checkArgument(maxStateSize > 0, "maxStateSize must be > 0");this.maxStateSize = maxStateSize;this.asynchronousSnapshots = asynchronousSnapshots;}/*** Private constructor that creates a re-configured copy of the state backend.** @param original The state backend to re-configure* @param configuration The configuration*/private MemoryStateBackend(MemoryStateBackend original, Configuration configuration) {super(original.getCheckpointPath(), original.getSavepointPath(), configuration);this.maxStateSize = original.maxStateSize;// if asynchronous snapshots were configured, use that setting,// else check the configurationthis.asynchronousSnapshots = original.asynchronousSnapshots.resolveUndefined(configuration.getBoolean(CheckpointingOptions.ASYNC_SNAPSHOTS));}// ------------------------------------------------------------------------// Properties// ------------------------------------------------------------------------/*** Gets the maximum size that an individual state can have, as configured in the* constructor (by default {@value #DEFAULT_MAX_STATE_SIZE}).** @return The maximum size that an individual state can have*/public int getMaxStateSize() {return maxStateSize;}/*** Gets whether the key/value data structures are asynchronously snapshotted.**

If not explicitly configured, this is the default value of* {@link CheckpointingOptions#ASYNC_SNAPSHOTS}.*/public boolean isUsingAsynchronousSnapshots() {return asynchronousSnapshots.getOrDefault(CheckpointingOptions.ASYNC_SNAPSHOTS.defaultValue());}// ------------------------------------------------------------------------// Reconfiguration// ------------------------------------------------------------------------/*** Creates a copy of this state backend that uses the values defined in the configuration* for fields where that were not specified in this state backend.** @param config the configuration* @return The re-configured variant of the state backend*/@Overridepublic MemoryStateBackend configure(Configuration config) {return new MemoryStateBackend(this, config);}// ------------------------------------------------------------------------// checkpoint state persistence// ------------------------------------------------------------------------@Overridepublic CheckpointStorage createCheckpointStorage(JobID jobId) throws IOException {return new MemoryBackendCheckpointStorage(jobId, getCheckpointPath(), getSavepointPath(), maxStateSize);}// ------------------------------------------------------------------------// state holding structures// ------------------------------------------------------------------------@Overridepublic OperatorStateBackend createOperatorStateBackend(Environment env,String operatorIdentifier) throws Exception {return new DefaultOperatorStateBackend(env.getUserClassLoader(),env.getExecutionConfig(),isUsingAsynchronousSnapshots());}@Overridepublic AbstractKeyedStateBackend createKeyedStateBackend(Environment env,JobID jobID,String operatorIdentifier,TypeSerializer keySerializer,int numberOfKeyGroups,KeyGroupRange keyGroupRange,TaskKvStateRegistry kvStateRegistry,TtlTimeProvider ttlTimeProvider,MetricGroup metricGroup) {TaskStateManager taskStateManager = env.getTaskStateManager();HeapPriorityQueueSetFactory priorityQueueSetFactory =new HeapPriorityQueueSetFactory(keyGroupRange, numberOfKeyGroups, 128);return new HeapKeyedStateBackend<>(kvStateRegistry,keySerializer,env.getUserClassLoader(),numberOfKeyGroups,keyGroupRange,isUsingAsynchronousSnapshots(),env.getExecutionConfig(),taskStateManager.createLocalRecoveryConfig(),priorityQueueSetFactory,ttlTimeProvider);}// ------------------------------------------------------------------------// utilities// ------------------------------------------------------------------------@Overridepublic String toString() {return "MemoryStateBackend (data in heap memory / checkpoints to JobManager) " +"(checkpoints: '" + getCheckpointPath() +"', savepoints: '" + getSavepointPath() +"', asynchronous: " + asynchronousSnapshots +", maxStateSize: " + maxStateSize + ")";} } 复制代码

  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中(但是为了高可用,也可以设置checkpoint state存储到filesystem);MemoryStateBackend仅仅用来做实验用途,比如本地启动或者所需的state非常小,对于生产需要改为使用FsStateBackend(将TaskManager的working state存储在内存,但是将JobManager的checkpoint state存储到文件系统以支持更大的state存储)
  • MemoryStateBackend有个maxStateSize属性(默认DEFAULT_MAX_STATE_SIZE为5MB),每个state的大小不能超过maxStateSize,一个task的所有state不能超过RPC系统的限制(默认是10MB,可以修改但不建议),所有retained checkpoints的state大小总和不能超过JobManager的JVM heap大小;另外如果创建MemoryStateBackend时未指定checkpointPath及savepointPath,则会从flink-conf.yaml中读取全局默认值;MemoryStateBackend里头还有一个asynchronousSnapshots属性,是TernaryBoolean类型(TRUE、FALSE、UNDEFINED),其中UNDEFINED表示没有配置,将会使用默认值
  • MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

小结

  • StateBackend接口定义了有状态的streaming应用的state是如何stored以及checkpointed;目前内置支持MemoryStateBackend、FsStateBackend、RocksDBStateBackend三种,如果没有配置默认为MemoryStateBackend;在flink-conf.yaml里头可以进行全局的默认配置,不过具体每个job还可以通过StreamExecutionEnvironment.setStateBackend来覆盖全局的配置
  • StateBackend接口定义了createCheckpointStorage、createKeyedStateBackend、createOperatorStateBackend方法;同时继承了Serializable接口;StateBackend接口的实现要求是线程安全的;StateBackend有个直接实现的抽象类AbstractStateBackend,而AbstractFileStateBackend及RocksDBStateBackend继承了AbstractStateBackend,之后MemoryStateBackend、FsStateBackend都继承了AbstractFileStateBackend
  • MemoryStateBackend继承了AbstractFileStateBackend,实现ConfigurableStateBackend接口(configure方法);它将TaskManager的working state及JobManager的checkpoint state存储在JVM heap中;MemoryStateBackend的createCheckpointStorage创建的是MemoryBackendCheckpointStorage;createOperatorStateBackend方法创建的是OperatorStateBackend;createKeyedStateBackend方法创建的是HeapKeyedStateBackend

doc

  • State Backends

本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 162202241@qq.com 举报,一经查实,本站将立刻删除。

最新评论

欢迎您发表评论:

请登录之后再进行评论

登录