javacodegeeks.com

Time Series & Deep Learning (Part 3 of N): Finalizing the Data Preparation for Training and Evaluation of a LSTM

In the 3rd part of this series I am going to complete the description started in part 2 of the data preparation process for training and evaluation purposes of a LSTM model in time series forecasting. The data set used is the same as for part 1 and part 2. Same as for all of the post of this series, I am referring to Python 3.In part 2 we have learned how to transform the time series into a supervised model. That isn’t enough yet to feed our LSTM model: other two transformations are needed.The first thing to do is to transform the time...

javacodegeeks.com

Sparklens: a tool for Spark applications optimization

Sparklens is a profiling tool for Spark with a built-in Spark Scheduler simulator: it makes easier to understand the scalability limits of Spark applications. It helps in understanding how efficiently is a given Spark application using the compute resources provided to it. It has been implemented and is maintained at Qubole. It is Open Source ( Apache License 2.0) and has been implemented in Scala.One interesting characteristic of Sparklens is its ability to generate estimates with a single run of a Spark application. It reports info such as...

javacodegeeks.com

Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)

In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can be done in any Scala or Java Spark application. The same dependencies for the Spark shell need to be registered in your build tool of choice (Maven, Gradle or sbt):groupId: za.co.absa.spline artifactId: spline-core version: 0.3.5 groupId: za.co.absa.spline artifactId: spline-persistence-mongo version:0.3.5 groupId: za.co.absa.spline artifactId:spline-core-spark-adapter-2.3 version:0.3.5With reference to Scala and Spark...

javacodegeeks.com

Deploying and scaling an Oracle database on a multi-node Kubernetes cluster

In this post I am going to explain how to deploy and scale an Oracle Express database on a multi-node Kubernetes cluster. I am going to use this Docker container by Maxym Bylenko.  I am referring to the container for the Oracle XE 11g because of the following open issue with that for Oracle XE 12c at the time I did the process described below. I am assuming the readers have at least basic or middle level knowledge of the Kubernetes concepts.First thing to do is to create a Pod. We can do this (and other operations described in this post)...