Skip to content
Digitization Handbook
GitHub

Convert them into smaller files (dimensions, formats …)

Regarding the repository they can have limited support for some formats, mostly audio and video files. Before starting to convert files check if the format is supported.

Conversion into smaller files saves the space in storage, fastens the delivery on the web and speeds up working with files. At the end, these files are presented on screen, which have a small resolution, besides 4K screens in 3840x2160, the others are in Full HD resolution 1920x1080 or smaller, so a page that has scanned dimension 2423x3325 will be very big on screen.

But different materials have a different conversion. For printed text it must have 300dpi to get better results for OCR, so the dimensions of the pages vary depending on the size. Feel free to combine the size of the scanned file. For photographs, maps, paintings and other similar material it is better to have larger dimensions because of details it may have that can be zoomed. But they don’t need 300 dpi for OCR, images for web are 72 dpi or 96 dpi usually, so it can be converted to that value.

Audio material can be converted in mp3 with higher kbps (kilobytes per seconds) maximum is 320 kbps which is good. Also there is a FLAC format, which gives better quality but results in bigger files. Some repositories recognize only wav files so, as mentioned before, stay in this format.

Video material stays on the same mp4 format.

Books and printed texts must have 300 dpi because of the OCR, but don’t need to be in large dimensions.