Odprti kop / Strip Mine launch (live blog)

I’m in Cyberpipe, liveblogging the launch of Strip Mine, a service that adds searching capability to video that Slovenian national television produces. It also allows bloggers to link to such content – the exact second of it – to blogs….

For example you could do this:

Miljonar: 12
Milijonar: 12 Kličejo vas tudi Miško. Ja. To me že od malih let tako kličejo, ne vem zakaj. Mojimir, Miško. Mojimir Miško Cilenše… Vir: RTV Slovenija

Andraž is now talking more about the language technologies that help users find data on the site…

The beta service iswas available at http://www.rtvslo.si/odprtikop/beta/. Check update #1.

The company that Andraž and Boštjan founded is located at http://www.zemanta.com/

The service know hos to link content to relevant sources – currently only Wikipedia. It knows some not to link names because linking firstnames doesn’t really makes sense.

What it also does is find what the tv show is talking about and is thus able to link to pages in the http://www.rtvslo.si portal that also talk about the same things.

It uses Lucene/PyLucene as the indexer, Python, MySQL, Apache, Django, Snowball/PyStemmer for stemming, Lematizator.

Users will also find that the content will only be added as is created. This means that all the locally produced and subtitled.
The content will be there until the videos are available, the quoting will work until the service is online.

APIs might come but are not there yet. The intention to create them was there but wwere not yet created since noboby would be using them. This means that we will probably be able to co-create the APIs as we use them. Comment on the services blog…

The service is completely open. It’s not fully featured, it’s “beta” but not beta. A lot of things are yet to come, but the service will be launched today anyway.

Andraž is saying that these kind of services are popping up, not yet in the big media business though.

He’s now talking more specifically about language technologies. The computer knows how to read but might never understand what it’s reading. Faking the understanding gives us possibilities to enhance the experience when viewing content. As opposed to the Semantic Web this approach seems to be more practical in its core.

The case of better experience is LinkedIn – currently it’s a passive service that you have to use. Andraž would like it to be active – search the callendar, find contacts, arrange meetings and just communicate this to the user. “I’m feeling lucky” services.

Interactivity – are we sure we want this from the computer? Don’t we just want it to be a better servant? Let it get orders and make decisions by itself.

But.

These problems are really hard to tackle. Not only it’s hard to do this in an easy language – we’re in Slovenia. What can we really do?

How about a service that automatically finds pictures for your current blog post. Maybe even process it so it fits your design?

Andraž is now joined by the cofounder and CEO of Zemanta, Boštjan Špetič, that is talking about what they’re gonna work on. He also thanked Zvezdan Matrič, MMC and the Cyberpipe. The ideas are longterm and there is no end to the possibilities that these kind of technologies provide.

And now the Q&A:
Q: reverse engineering the .890 subtitles format.
A: HEX editor to find the blocks and then trying to decode the encoding for the letters

Q: processing power
A: it takes two hours to process – one hour to download and one hour to process. Slovenian Wikipedia is small enough to hold in RAM to process faster. Linking to Rtvslo news is scaled down to about 20.000 articles. They’re linked to every paragraph.

Q: will the paragraphing heuristicts change
A: no, probably not.

Q: how much information is already in the subtitles? timestamps?
A: all timestamps are based on the subtitles, could be done without in the future.

Q: did the subtitling process change?
A: no

Q: what about the future? will there be voice recognition?
A: focus is on smart processing, there might be voice recognition to define stuff.

Q: all automatic?
A: yes. no manual changing is done currently.

Q: will there be?
A: it’s possible. maybe in the future the journalists will change the data.

Q: how much time was spent and what was the plan?
A: experimental from october 2006. rewritten for production once the prototype was done. the service gradually got bigger as it was developed.

Q: how about the pictures?
A: tricks. we know the beginning and the end of the paragraph. the interval usually contains about 5000 frames. which one to take? you take an image with a certain JPEG size – smaller are too blurry, bigger are also not good.

Q: is all text online?
A: everything is in the show except the prebuild subtitles.

Q: congrats. web has video with no subtitles. what are you planning to do in the future since you’re in a niche market?
A: subtitling of videos is expensive so we’re not going there. we’ll be in the niched markets – blogging, finding already tagged content,..

And we’re done.

Update #1:
As the service already launched it’s available at http://www.rtvslo.si/odprtikop.

One Response to “Odprti kop / Strip Mine launch (live blog)”

  1. andraž says:

    Ha, unbelievable. Lifeblogging and translating at the same time!

    Hopefully by tomorrow video will be available – in slovenian though…

Leave a Reply