The newest release from the Apache Flink framework is here. Their efforts over the past two releases have come to fruition with the new state schema evolution story. This latest release also resolves more than 420 known issues, and adds several new features and improvements.
Apache Flink is a “framework and distributed processing engine for stateful computations over unbounded and bounded data streams”. Its use cases include event-driven applications, data analytics applications, and data pipeline applications.
On April 9, 2019 the latest release became available. Version 1.8.0 resolves more than 420 issues and adds new features and improvements.
Flink v1.8.0 new features
The release announcement by Aljoscha Krettek states that this new version of Apache Flink brings the project “closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way”.
Some of the newest features include:
- State schema evolution story: The community worked on this new feature for 2 release spans. Now, v1.8.0 finalizes the effort with support for POJO state schema evolution. All Flink serializers have also updated to use new serialization compatibility asbtractions. Thus, Flink serializers no longer Java-serialize into savepoints. Flink recommends that if you are using custom
TypeSerializerimplementations for your state serialize, that you upgrade to the new abstractions. This update also provides pre-defined snapshot implementations for common serializers.
- Cleanup of old state based on time-to-live (FLINK-7811): Time-to-live was introduced for Keyed state with FLINK-9510. Now old TLL entries are continuously cleaned up for the RocksDB state backend and heap state backend.
- SQL pattern detection with user-defined functions and aggregations (FLINK-10597) (FLINK-7599): New extended features for the MATCH_RECOGNIZE clause. This includes custom logic during pattern detection, and adds aggregations for complex CEP definitions.
- RFC-compliant CSV format (FLINK-9964): SQL tables now can be read and written in a a RFC-compliant CSV format.
- KafkaDeserializationSchema gives direct access to Kafka ConsumerRecord (FLINK-8354): Allows access to all Kafka provided data for a record. This will eventually deprecate
- Per-shard watermarking option in FlinkKinesisConsumer (FLINK-5697): Adds pre-shard watermarks for FlinkKinesisConsumer.
- New consumer for DynamoDB Streams to capture table changes (FLINK-4582): More added connectivity to AWS services.
- Support for global aggregates for subtask coordination (FLINK-10887): Allows sharing of information between parallel subtasks with
- Convenience Hadoop library changes (FLINK-11266): As per the release notes, “Convenience binaries that include hadoop are no longer released”.
- FlinkKafkaConsumer filters out restored partitiions no longer associated with a specific topic (FLINK-10342): Users can retain the previous behavior with the
disableFilterRestoredPartitionsWithSubscribedTopics()configuration method on the
- Maven modules Table API changes (FLINK-11064): Update your dependencies to
flink-table-plannerand the correct dependency of either
SEE ALSO: Adopting Jakarta EE
For further reading about the new release, view the entire changelog and release notes.
Our community is happy to announce the release of Flink 1.8.0. Major features include the completion of state evolution support, lazy clean up strategies for state TTL, and improved pattern matching support in SQL. Check out the release announcement: https://t.co/PO94kz2GBg pic.twitter.com/pmFQiA6Dlk
— Apache Flink (@ApacheFlink) April 10, 2019