Saturday, July 23, 2016

MapReduce in Apache Spark

Based on the Course CS120x Distributed Machine Learning with Apache Spark.

Basically we can summarize the map/reduce paradigm as following:

Map: transforms a series of elements by applying a function individually to each element in the series. It then returns the series of transformed elements.
Filter: applies a function individually to each element in a series but, the function evaluates to True or False and only elements that evaluate to True are retained.
Reduce: operates on pairs of elements in a series. It applies a function that takes in two values and returns a single value. Using this function, reduce is able to, iteratively, “reduce” a series to a single value.

We have define an array of 10 elements and transform it in a Resilient Distributed Dataset (RDD)

numberRDD = range(0,10)
numberRDD = sc.parallelize(numberRDD, 4)
numberRDD.collect()
> Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Map the numberRDD using a lambda function that mulitplies each element by 5

numberRDD.map(lambda x:x*5).collect()
> Out[2]: [0, 5, 10, 15, 20, 25, 30, 35, 40, 45]

Filter the numberRDD in order to obtain only the number multiple of 2

numberRDD.filter(lambda x:x%2==0).collect()
> Out[3]: [0, 2, 4, 6, 8]

Reduce the numberRDD summing pairs of numbers

numberRDD.reduce(lambda x1,x2:x1+x2)
> Out[4]: [45]

Putting all together we obtain the sum of the numbers in the even positions

numberRDD.map(lambda x:x*5).filter(lambda x:x%2==0).reduce(lambda x1,x2:x1+x2)
> Out[5]: [100]

This post has been written using Markdown and Dillinger. Here an interesting Markdown Cheatsheet

Sunday, July 17, 2016

Convolutional Layer of CNN in one Picture

A complete course at Stanford has devoted to Convolutional Neural Network.
The Course Notes (by Andrej Karpathy) are well written and they worth a look.

That course notes have inspired me to create a picture for summarising some concepts.

An interesting summary (adapted from here ) is the following:

Input Layer

Size: $W_1 \times H_1 \times D_1$
Hyperparameters:

Number of filters $K$
Dimension of the filter $F \times F \times D_1$
Stride: $S$
Amount of Zero Padding: $P$

Output Layer

Size: $W_2 \times H_2 \times D_2$
$W_2 = \frac{W_1 - F + 2P}{S} + 1$
$H_2 = \frac{H_1 - F + 2P}{S} + 1$
$D_2 = K$

The parameter sharing introduces $F \times F \times D_1$ per filter, for a total of $(F \times F \times D_1) \times K$ weights and $K$ biases

In the output volume, the d-th depth slice (of size $W_2 \times H2$) is the result of performing a valid convolution of the d-th filter over the input volume with a stride of $S$, and then offset by d-th bias.

Another interesting post on the Convolutional Neural Network is here

Notes on Machine Learning, AI, Big Data etc etc

Search This Blog

Saturday, July 23, 2016

MapReduce in Apache Spark

Based on the Course CS120x Distributed Machine Learning with Apache Spark.

Sunday, July 17, 2016

Convolutional Layer of CNN in one Picture

About Me

Popular Posts

Blog Archive

Search This Blog

Saturday, July 23, 2016

MapReduce in Apache Spark

Based on the Course CS120x Distributed Machine Learning with Apache Spark.

Sunday, July 17, 2016

Convolutional Layer of CNN in one Picture

About Me

Popular Posts

Subscribe To

Blog Archive