The Vector from Japanese to Western Movies

Nothing in this post is new research; this is all relatively basic machine learning. But it is fun.

This is the first day of my vacation. Time for some machine learning! Today, I will create embeddings. I will use the relatively big dataset from the Netflix Prize competition to create embeddings for movies.

We develop a way to automatically find the American counterpart of a Japanese movie!

After some preprocessing of the Netflix data, I ended up with 100,000,000 rows of data looking like this:

array([[ 14550, 108683, 4],
 [ 10583, 222881, 2],
 [ 16278, 3416, 5],
 [ 8131, 144114, 3],
 [ 5862, 477329, 5],
 [ 10928, 225987, 3]])

The first column is the movie ID, the second columns is the user ID, and the last one is a rating 1–5 that the user assigned the movie. I know the titles of the movies, but have not made them available to the algorithm.

I used this data to train the following neural network:


The network takes the movie and user IDs as input, and returns the rating.

After training the Neural network, we have a 40-dimensional embedding of every movie in the dataset. These vectors are hard to visualize, but we can visualize the first two dimensions after applying PCA. Here are the 200 movies with the most number of ratings:


Some things make sense here. All three Lord of the Rings movies are really close to each other and so are the Kill Bill movies. Remember, the network did not have access to the names of the movies.

But perhaps the network has learnt some more interesting structure of the movies? Let’s define the following function:

def find_american_counterpart(japanese_name, n=10):
  # Use The Ring/Ringu as the canonical example
  from_japan_to_usa = (movie_points[movie_id["The Ring"], :]
                       - movie_points[movie_id["Ringu"], :])
  print_closest(movie_points[movie_id[japanese_name], :]
                + from_japan_to_usa, n)

This function takes a name of the Japanese movies and moves in the 40-dimensional space in the exact same direction and distance as from Ringu to The Ring.

Let’s try it on an old Kurosawa movie:

find_american_counterpart(“Yojimbo”, 1)

>>>find_american_counterpart("Yojimbo", 1)

A Fistful of Dollars

Bingo! 😀 I am a bit surprised that this worked! As a sanity check, it is good to print the movies that are close to Yojimbo, so we are not just returning the closest one.

>>>print_closest(movie_points[movie_id["Yojimbo"], :])

Throne of Blood
Hidden Fortress
The Third Man
The Big Sleep
Modern Times
Black Adder II

Looks good! Here is a visualization of the vectors in 2D with a couple of thousand other movies plotted as well:movie_vectors

These are the real movie points and the real vector, but a lot of information is lost when going from 40 dimensions down to only two.

Let’s try the same vector with The Grudge:

>>>find_american_counterpart("Ju-on: The Grudge")

The Ring
The Legend of Sleepy Hollow
Minority Report
Stir of Echoes
The Grudge
Terminator 3: Rise of the Machines

This did not work as well (9th from the top). But at least The Grudge is not particular close to Ju-on: The Grudge, which means the counterpart function did some useful work.

For the interested, here is the Keras code generating the neural network:

movie = Input(shape=(1,), dtype='int32', name='Movie')
user = Input(shape=(1,), dtype='int32', name='User')

movie_emb = Embedding(num_movies, 40, name='MovieEmbedding')(movie)
user_emb = Embedding(num_users, 40, name='UserEmbedding')(user)

input = Concatenate()([movie_emb, user_emb])
x = Flatten()(input)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
output = Dense(1)(x)

model = Model(inputs=[movie, user], outputs=output)
model.compile(loss='mean_squared_error', optimizer='adadelta')

One Response to The Vector from Japanese to Western Movies

  1. Svetlana says:

    Hey there! Quick question that’s totally off topic. Do you know how to make your site mobile friendly? My weblog looks weird when browsing from my iphone4. I’m trying to find a theme or plugin that might be able to correct this issue. If you have any recommendations, please share. Thank you!|

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s