WordRider Home
Welcome! Log In Create A New Profile

Advanced

500+500 CZK bounty: Requesting ceskatelevize.cz closed captions

500+500 CZK bounty: Requesting ceskatelevize.cz closed captions
January 23, 2023 08:25PM
Hi,
I have found how to manually download CCs from ceskatelevize.cz (Czech TV). Please add this feature to the ceskatelevize.cz plugin! I really want this to be done so I tried my best (given my zero Java skills) to make implementing this as straightforward as possible, so that the feature addition could be a <30-minute job for an experienced plugin developer following my guide. The following needs to be written:
  1. Add a boolean ccEnabled in CeskaTelevizeSettingsConfig.java (false by default) and the relevant "Include closed captions/Stáhnout skryté titulky" checkbox in CeskaTelevizeSettingsPanel.java
  2. Obviously, only proceed with the following steps if ccEnabled, else break or skip or whatever this action is called in Java.
  3. Apply the regular expression replacement below. (This version works on the most common URL format)
      FIND: https\:\/\/www\.ceskatelevize\.cz\/porady\/[^\/]*\/(\d{3})(\d*)[\S]*
      REPLACE: https\:\/\/imgct\.ceskatelevize\.cz\/cache\/data\/ivysilani\/subtitles\/$1\/$1$2\/sub.vtt
    (What the regex does in human language: get the second number in the URL (videoId) and its first three digits (vid) and apply them to this URL: https://imgct.ceskatelevize.cz/cache/data/ivysilani/subtitles/vid/videoId/sub.vtt, example: https://www.ceskatelevize.cz/porady/15496675472-prezidentske-volby/223411033110112/ => https://imgct.ceskatelevize.cz/cache/data/ivysilani/subtitles/223/223411033110112/sub.vtt) Note that not all URLs explicitly contain video IDs, which may need to be retrieved by another function are found on line 300 in CeskaTelevizeFileRunner.java as the IDEC parameter of the player iframe.

    Turns out, the URL parsing has been solved in the code already, in a much more robust way than my attempt above. The video ID, if not obtained from the URL, is found on line 300 in CeskaTelevizeFileRunner.java as the IDEC parameter of the player iframe. The subtitle URL is then found using the following regex on videoId:
      FIND: (\d{3})(\d*)
      REPLACE: https\:\/\/imgct\.ceskatelevize\.cz\/cache/data\/ivysilani\/subtitles\/$1\/$1$2\/sub.vtt
    Of course, you can just use plain Java code to concatenate "https://imgct.ceskatelevize.cz/cache/data/ivysilani/subtitles/" + firstThreeCharacters(videoId) + "/" + videoId + "/sub.vtt".
  4. If the regex match was successful, Add the address obtained in the previous step as the URL of a new item to downloads list and rename it to filename.vtt (where the filename string is in the state just before the default extension .ts is added on line 130 in CeskaTelevizeFileRunner.java, so that the resuting subtitle file has the same name as the downloaded video sans file extension) so that VLC finds it. It is just a plain direct download that will succeed if direct downloads are enabled. Or to gain independence from this setting, add functionality to CeskaTelevizeFileRunner.java to handle the file instead. This might also make it easier to check for the file missing, see note below.
Note that some programs, especially those not broadcast on TV after CC became legally mandated, like web exclusives and historical films, lack subtitles in the player settings and the sub.vtt file URL throws a 404. Please add some check for that so that the HTML page containing "404 Not Found" is not downloaded instead. However, this is very rare and the error has practically no consequences except for a few confusing files on disk so this is not critical.
I haven't written a line of Java code ever and I don't think I can add this functionality myself, I just set up JDK and keep getting errors on any code I write (as well as any attempt to compile the plugin) because I simply don't understand the language and how to set up dependencies/libraries/modules in the IDE, and plugin developer tutorials are no longer available despite having watched the tutorials. I also re-encoded them into a modern format; available for download here. Please rehost them!
Therefore, I am offering a 500 CZK bounty via bank transfer, PayPal or maybe some crypto to whoever successfully adds this functionality for all CC'd content at ceskatelevize.cz. I will also donate the same amount to Vity's FreeRapid Downloader project once this is done. Tagging Vity, tong2shot and birchie JPEXS as previous developers of this plugin to consider this first and foremost.

Edit 2023-01-31: updated some things after studying the source code, for example changed my term Episode ID to videoId (which is the actual variable name) to make it more straightforward for the developer. Unfortunately, there is little more I can do without learning Java and thoroughly reading through the source code, which I don't currently have time for. (Edit: I might also need to ask some questions about the project structure and how to set up the IDE because the IntelliJ setup tutorial does not work for the current version of the plugin)



Edited 9 time(s). Last edit at 02/03/2023 10:13AM by ChaoticNeutralCzech.
Sorry, only registered users may post in this forum.

Click here to login