Hello there,
I am looking for experienced users with hardcoded subtitles (hardsub).
My need is to extract hardcoded subtitles (french) from a mp4 video.
I found few scripts but it’s not very clear for me.
VideoSubFinder
video-subtitle-extractor
https://github.com/oliverfei/videocr-PaddleOCR
Maybe some other
What is the easier and best to use with Leap 15.5?
Any advice, howto is welcome.
Let me know if you need more info.
Many thanks
May I suggest that, you take a look at the VLC media player – there’s a plugin which seems to be installed as default named “VLsub” – simply open the menu for “View” and enable “VLsub”.
Thank you for your suggestion.
I tried but found nothing. Look
here
However, the sub I want is not in all the sub sites I tried like
subscene, opensubtitles and more.
The sub is a French translation in Russian serial low resolution. I have a better resolution VO without sub.
So I want to extract from the low resolution to add to the other.
Then, you could take a look at the video file with FFmpeg – it has a function which may be sufficient for what you’re trying to achieve: <https://trac.ffmpeg.org/wiki/ExtractSubtitles>.
Not at all.
ffmpeg can extract only sub tracks, not hardcoded sub.
I need a OCR that can “read” image text and convert in real text.
Have a look at the links I provided first.
I am looking for user with experience on how to install one of these programs, they are not in the repo as usual.
Yes, I looked at those packages –
- Please investigate if they’re available in Flatpak.
The reason is, such tools are better installed within the space of a specified user rather than, system wide …
Did you take a look at “ CCExtractor”?
<https://github.com/CCExtractor/ccextractor/releases>
<https://ccextractor.org/>
- But, you’ll have to build it …
An RPM package isn’t available off the shelf …
I did a search before posting in Flatpck hub, not subtitle extractor, only OCR
I just had a look.
Unfortunately, CCExtractor is not a hardcoded sub extractor.
FYI, the processes of hardcoded sub extractor are to take a picture of the sub in the image (image processing), record the start time and stop time in the video, then process the image with the OCR, check the output words in a dictionary then write the text with the times in a file in the right format, usually .srt file.
Perhaps it would help others help you if you posted the information from:
ffmpeg -i <filename>
I think you mean
ffprobe -i filename
So,
> ffprobe -i "17 moments du printemps 1_12.mp4"
ffprobe version 4.4.4 Copyright (c) 2007-2023 the FFmpeg developers
built with gcc 7 (SUSE Linux)
configuration: --prefix=/usr --libdir=/usr/lib64 --shlibdir=/usr/lib64 --incdir=/usr/include/ffmpeg --extra-cflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --optflags='-fmessage-length=0 -grecord-gcc-switches -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -g' --disable-htmlpages --enable-pic --disable-stripping --enable-shared --disable-static --enable-gpl --enable-version3 --disable-openssl --enable-avresample --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcelt --enable-libcdio --enable-libdav1d --enable-libdc1394 --enable-libdrm --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libv4l2 --enable-libvpx --enable-libwebp --enable-libxml2 --enable-libzimg --enable-libzvbi --enable-libmfx --enable-vaapi --enable-vdpau --enable-version3 --enable-libfdk-aac-dlopen --enable-nonfree --enable-libvo-amrwbenc --enable-libx264 --enable-libx265 --enable-librtmp --enable-libxvid
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '17 moments du printemps 1_12.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.45.100
Duration: 01:08:42.61, start: 0.000000, bitrate: 702 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 640x480 [SAR 1:1 DAR 4:3], 602 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 93 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream #0:0: Video
Stream #0:1: Audio
No subtitle stream as it is hardcoded.
Here is a good description how to use it (don’t click on any download button on this page…):
No need to compile or build anything. Only make the *.run file executable and try if it works with Leap 15.5
ffmpeg should have worked for that as well, but either way, this info will help those trying to help you understand the file you’re working with.
As you say, it does seem that you need something that will handle OCR on the burned-in subtitles - I’m not aware of anything that does that, but maybe someone else will have an idea now that you’ve got the specs spelled out.
I downloaded it, unzip and cd.
/VideoSubFinder> ls
bitmaps libnppicc.so.12 libopencv_imgproc.so.407 libtbb.so.2
Docs libnppig.so.12 libopencv_ml.so.407 libvidstab.so.1.1
finished.wav libopencv_calib3d.so.407 libopencv_objdetect.so.407 libwx_baseu-3.2.so.0
libavcodec.so.58.134 libopencv_core.so.407 libopencv_photo.so.407 libwx_gtk3u_aui-3.2.so.0
libavfilter.so.7.110 libopencv_dnn.so.407 libopencv_stitching.so.407 libwx_gtk3u_core-3.2.so.0
libavformat.so.58.76 libopencv_features2d.so.407 libopencv_videoio.so.407 settings
libavresample.so.4.0 libopencv_flann.so.407 libopencv_video.so.407 VideoSubFinderWXW
libavutil.so.56.70 libopencv_gapi.so.407 libpostproc.so.55.9 VideoSubFinderWXW.run
libcudart.so.12 libopencv_highgui.so.407 libswresample.so.3.9
libnppc.so.12 libopencv_imgcodecs.so.407 libswscale.so.5.9
VideoSubFinderWXW.run can be run.
./VideoSubFinderWXW.run
./VideoSubFinderWXW: error while loading shared libraries: libjpeg.so.62: cannot open shared object file: No such file or directory
sudo zypper se libjpeg
Loading repository data...
Reading installed packages...
S | Name | Summary | Type
--+-----------------------+-----------------------------------------------------------------------+--------
| libjpeg-turbo | A SIMD-accelerated library for manipulating JPEG image files | package
i | libjpeg8 | A SIMD-accelerated JPEG compression/decompression library | package
| libjpeg8-32bit | A SIMD-accelerated JPEG compression/decompression library | package
| libjpeg8-devel | Development Tools for applications which will use the Libjpeg Library | package
| libjpeg8-devel-32bit | Development Tools for applications which will use the Libjpeg Library | package
| libjpeg62 | A SIMD-accelerated JPEG compression/decompression library | package
| libjpeg62-32bit | A SIMD-accelerated JPEG compression/decompression library | package
| libjpeg62-devel | Development Tools for applications which will use the Libjpeg Library | package
| libjpeg62-devel-32bit | Development Tools for applications which will use the Libjpeg Library | package
| libjpegxr0 | Open source implementation of jpegxr | package
I can’t find what provide libjpeg.so.62
Any idea?
Thanks anyway heanders
It does strike me that you may need to do something like crop the video to the subtitle area - otherwise your OCR software may pick up non-subtitle text in the images and include that in the output.
Are the subtitles on a black background, or just superimposed over the image with a transparent background? (That will also make a big difference on the software’s ability to extract the subtitle text).
libjpeg62
is the package that should include that library.
After adding libjpeg62
pkg, woking fine.
I followed the howto to get around 850 .jpeg files.
I had to check all of them because sometime pictures with high contrast are kept.
Now left around 530 files.
Then I installed tesseract + french.
Fast easy. As there is no sub background, so sometime the picture mixes with the text. After create sub, I get a sub file. Now, I have to correct the bad text.
A bit of work but not too much.
First time I did that, not too bad!
Many thanks for your help.
zypper search --provides libjpeg.so.62