File size: 822 Bytes
74fa892
 
a9f4d38
 
 
 
 
74fa892
 
38f1c86
74fa892
 
7461ae2
 
 
 
 
 
 
38f1c86
7461ae2
 
 
38f1c86
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
license: cc-by-nc-nd-4.0
tags:
- Image
- Captionning
- RESNET-152
- LSTM
---

# Introduction

This model is defined as proposed in the book "mastering pytorch".
It is based on CNN-encoder and a LSTM-decoder.

The CNN-encoder is based on a pretrained RESNET-152. The last layer of the resnet is replaced by a vector embedding layer of 256 elements.
The LSTM-decoder use an input of 256, a hidden layer of 512, and uses the vocabulary size.

The model has been trained as a pure learning exercise, and so the model performances remain relatively mean.

# Training procedure

For the sake of the exercise, the model has been trained for only 5 epochs.

It has been trained on the COCO dataset.

# Support

If you like my work, feel free to support me here:
[buymeacoffee.com/selfmaker](https://buymeacoffee.com/selfmaker)