doc:appunti:linux:video:ripping_dvds_with_mencoder
                Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| doc:appunti:linux:video:ripping_dvds_with_mencoder [2017/10/12 09:30] – [Extracting the subtitles] niccolo | doc:appunti:linux:video:ripping_dvds_with_mencoder [2020/04/21 17:05] (current) – [OCRing] niccolo | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Ripping DVDs with Mencoder ====== | ====== Ripping DVDs with Mencoder ====== | ||
| + | :!: For a simple recipe to rip (extract) the content of a DVD using Debian 10, see **[[vobcopy]]**. | ||
| ===== Install the necessary programs ===== | ===== Install the necessary programs ===== | ||
| Line 199: | Line 200: | ||
| Now, we skip the first pass of the video encode, and remove the '' | Now, we skip the first pass of the video encode, and remove the '' | ||
| - | ===== Subtitles ===== | + | ===== Extract | 
| + | |||
| + | FIXME The following programs are **missing in Debian 10 Buster**: **tcextract**, | ||
| DVDs have subtitles stored as images. There are some options for dealing with them: | DVDs have subtitles stored as images. There are some options for dealing with them: | ||
| Line 228: | Line 231: | ||
| </ | </ | ||
| - | Now use transcode | + | The **tccat** command will concatenate all the files that compose the specified '' | 
| + | |||
| + | The **tcextract** command | ||
| + | |||
| + | **NOTICE**: The number **0x21** is **0x20** + the subtitle ID. | ||
| < | < | ||
| - | tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x22 > subs-en | + | tccat -i $RIPDIR -T $TITLE -L | tcextract -x ps1 -t vob -a 0x21 > subtitles_stream.ps1 | 
| </ | </ | ||
| - | where 0x22 is 0x20 + the subtitle ID. | + | If you have just the .VOB files, you can use this recipe: | 
| + | |||
| + | < | ||
| + | cat VTS_02_? | ||
| + | </ | ||
| - | If you want vobsub | + | Use the **[[subtitleripper]]** scripts to obtain the VobSub | 
| < | < | ||
| - | subtitle2vobsub -o vobsubs-en | + | subtitle2vobsub -p subtitles_stream.ps1 | 
| </ | </ | ||
| + | We used the .IFO file of the selected DVD track (#2 in the example). The subtitles will be saved into the [[glossary# | ||
| + | |||
| + | If you need to extract only a part of subtitle stream (e.g. if you have cut the original track into several pieces), just use the **-e** option, to indicate the **start**, the **end** and a **new_start** (new time offset) of the extraction, in **seconds**, | ||
| + | |||
| + | < | ||
| + | subtitle2vobsub -p subtitles_stream.ps1 \ | ||
| + | -i $RIPDIR/ | ||
| + | -e 9673.914, | ||
| + | </ | ||
| ==== OCRing ==== | ==== OCRing ==== | ||
| Line 247: | Line 267: | ||
| < | < | ||
| - | subtitle2pgm | + | cat subtitles_stream.ps1 | subtitle2pgm | 
| </ | </ | ||
| - | Each subtitle should now be one pgm file, and a srtx file will be created | + | If you want to control how grey levels are converted, try to use the **%%-c%%** option of subtitle2pgm, | 
| - | Now to ocr all that with gocr (using a nice wrapper for the job): | + | Each subtitle should now be one file named like **movie_subtitle0003.pgm**, | 
| + | |||
| + | === With Tesseract OCR === | ||
| + | |||
| + | <code bash> | ||
| + | #!/bin/sh | ||
| + | find . -type f -name ' | ||
| + | echo -n " | ||
| + | tesseract -l eng --psm 4 " | ||
| + | done | ||
| + | </ | ||
| + | |||
| + | === With Gocr === | ||
| + | |||
| + | **NOTICE**: Dont' use the following, because Gocr is not the best tool for OCR. Use **Tesseract OCR** instead. | ||
| + | |||
| + | To ocr all the .pgm image with **gocr** (using a nice wrapper for the job): | ||
| < | < | ||
| - | pgm2txt | + | pgm2txt | 
| </ | </ | ||
| It will prompt you for tons of characters that it doesn' | It will prompt you for tons of characters that it doesn' | ||
| - | We will re-merge all these text files produced into a big subtitle file: | + | ==== Make a single .srt file ==== | 
| + | |||
| + | Now we will re-merge all these text files produced into a big subtitle file: | ||
| < | < | ||
| - | srttool -s -w < english.srtx > english.srt | + | srttool -s -w < movie_subtitle.srtx > movie_subtitle.srt | 
| </ | </ | ||
| Line 285: | Line 323: | ||
| You can now add english.srt onto the end of your '' | You can now add english.srt onto the end of your '' | ||
| + | ==== Fixing time, etc ==== | ||
| + | |||
| + | Finally you can proof-check the final .srt file using the graphical interface of **Gaupol**, a full-featured subtitle editor program. It can handle some of the more common operation required: | ||
| + | |||
| + | * **Shift times**, from //Tools//, //Shift Positions...// | ||
| + | * **Renumber subtitles**, | ||
| ===== Links ===== | ===== Links ===== | ||
doc/appunti/linux/video/ripping_dvds_with_mencoder.1507793446.txt.gz · Last modified:  by niccolo
                
                