2022-12-30, 14:58:47
(2022-12-30, 10:58:55)Ralome Írta:(2022-12-29, 22:52:08)Mor Tuadh Írta: Ki lehet szedni, csak kicsit pöcsölős.
Megerősítem, avagy...
(2022-12-30, 02:09:29)remi Írta: részlegesen sikerült megoldanom a videosubfinder nevű szoftverrel. Ez kiszedte a feliratokat és jpg-t generált belőle aminek a fileneve az időzítés, de itt megakadt a mutatvány
Ezután én külön OCR-es progival időzítéses file-nevű .txt-ket csináltam, amikhez már sitty-sutty meg lehetett írni a PHP-s kódot, ami összerakta belőlük a kész .srt-t. Szóval valóban kissé "desperate, much?" jellegű a dolog, igen
Én ezt a leírást követtem, amikor ilyet csináltam. Van egy rászletesebb verziója is, akinek kell, az szóljon, és elküldöm e-mailen.
VideoSubFinder + FineReader kell hozzá
1. Open your video in VideoSubFinder. Set the working area by moving the upper edge of the
Video Box. Hit 'Run Search' for the program to scan the video and save the screenshots of
the frames containing subtitles. The image filenames will contain the timecode and duration
info, which will be used by the program later.
2. Go to the VideoSubFinder's RGBImages subfolder, skim through the files and delete the
ones that do not contain subtitles.
3. Go back to VideoSubFinder's OCR tab and hit 'Create Cleared TXT Images', which will
launch the process of cleaning up the images in the RGBImages folder.
4. Once the process is over, import the collection of generated images from the TXTImages
folder to FineReader to be OCR'd (see the FR guide below if you are new to the program).
5. Save the OCR result in .txt format in the VideoSubFinder's TXTResults subfolder.
.txt doesn't preserve formatting, but we got no choice really, VideoSubFinder only works with .txt.
6. Launch VideoSubFinder, go to the OCR tab and hit 'Create Sub From TXT Results' to
merge the saved .txt files into a single .srt file.
7. Open the .srt file in Notepad++, select 'Encode in ANSI' in Encoding menu, replace  with
blank, select 'Encode in UTF-8' in Encoding menu, save.*
8. Watch the hardsubbed version with external soft subs to detect any errors & OCR mishaps,
retrieve formatting and make sure nothing is missing.
*The BOM-characters are added by FineReader when saving in UTF-8, and it seems like there's no way around this little snag. If you don't get rid of them, they'll get in the way of automated formatting fixes, say, in SubtitleEdit. Also, these crappy characters may look differently when you open the subtitle in Notepad++. Just look for a bunch of weird characters at the beginning of all lines and replace them with blank.