Sebastian’s Blog & Website

Research

Happy Easter 2010

Happy Easter 2010

Guess this is a pretty special Easter holiday this year; at least for me …

Two months ago I’ve published as a book my PhD Thesis on usability of information discovery in Semantic Digital Libraries; few days ago I received my copy of the hardcover version.

This is also the iPad weekend, and my book is also published in the ePUB format, which makes it iPad-ready.

There also other things on the horizon, of which I will let you know pretty soon…

Therefore, I would like to wish you all a very Happy Easter.


(Psst, there is 25% discount on my recent book as an Easter gift)


New book on Semantic Digital Libraries

The cover of the book on Semantic Digital LibrariesProbably you heard by now, but if you didn’t – here is the big news: I have published a second book on Semantic Digital Libraries: Improving Usability of Information Discovery with Semantic and Social Services.

Compared to the previous one, it is not a compilation of articles contributed by myself and my colleagues; this book is based on my thesis. The book covers most important aspects of what Semantic Digital Libraries are and what is that they can offer. I present a very thorough review of literature describing various advanced digital library projects and components. Based on the identified requirements I propose architecture, data model and classification of ontologies for semantic digital libraries. I describe two example information retrieval and knowledge management techniques, which utilize semantic web and social networking technologies. Finally, I briefly describe the JeromeDL system and provide very detailed and thorough analyses of evaluation comparing usability of end-user services offered by a semantic digital library with those offered by a classic digital library.

The book is available for purchase at lulu.com.
Support independent publishing: Buy this book on Lulu.

Since I understand that not everyone could afford buying a complete, hardcover version, I have prepared an array of different versions (see below) of this book, ranging from Hardcover through Paperback through E-Book. There is also a lite version of this book, which does not contain the attachments.

Hardcover Paperback E-Book
Full version ISBN: 978-1-4452-7770-7 €50
Support independent publishing: Buy this book on Lulu.
ISBN: 978-1-4452-8243-5 €30Support independent publishing: Buy this book on Lulu. PDF €15Support independent publishing: Buy this book on Lulu.
Lite version* ISBN: 978-1-4452-8864-2 €40Support independent publishing: Buy this book on Lulu. Lite Paperback version €20
Support independent publishing: Buy this book on Lulu.
ePUB** €15Support independent publishing: Buy this book on Lulu.


Amazon Kindle $14.99+Taxes
Purchase at Amazon

* NOTE: Lite version does not contain appendix

** Open standard format, compatible with, e.g., iPad DRM-enabled, compatible with Adobe Digital Editions

I hope your will find this book a good read and very helpful for your studies and work.

(Pssst, come and visit our Facebook group tomorrow: you will learn how to get the book cheaper over the next couple of days)



New “home” for Semantic Digital Libraries

January was madness: I was hoping for this year to be more relaxed after all the rush with thesis and the company in 2009. But it did not start like one; guess I should blame my workholic attitude. Maybe I should do what my supervisor once did: hang a note (so I can see it) saying “Just say NO”.

I have been asked many times recently about materials regarding Semantic Digital Libraries but could not really point to just one location. Yes, there are all my papers at http://library.deri.ie (which happens to be down quite often ever since I left the institute), and many of my presentations are on slideshare, and than there is my book, and my thesis, and my tutorials … I realized that there has to be (finally) a place where I can gather and reference all of that.

Semantic Digital Libraries - logoAnd here it is: http://semdl.info/. At the moment you will find there all major presentations I did related to the topic, archive information about tutorials we gave (together with complete slides from most of them), and references to two of my books on Semantic Digital Libraries. Most likely I will use our infrastructure to set up a JeromeDL with all the papers on the subject and reference them there.

But, it’s not the end: I do not want it to be a one-man show. I hope that all other people that are interested in the subject will help me to fill in the site with more materials and make it alive (someone already suggested a blog :) ). Please let me know if you want to join the effort.


What was my thesis about ?

In case you were wondering what was my PhD thesis about – here is are some tips (thanks to Woordle):


My research and knowledge workers

Just had a very interesting conversation on how my research could help knowledge workers, e.g., people responsible for producing documentation in large corporations.

I have to say, I am glad that whatever I wrote here helped someone make this association with their work to my research. To be frank, when I invented notitio.us 2 years ago, knowledge workers were among those people I was thinking about.

(more…)


My PhD Viva Voce on Video

Few days ago I found enough time to write about the last stages of my PhD process, namely my viva voce and what followed afterwards.

A friend of mine, Lukasz, told me that he still had the videos he recorded during my presentation. He was so kind to publish them on Vimeo.

Here they are:

Sebastian’s PhD Deffence 1 from Lukasz Porwol on Vimeo.

(more…)


Tutorial on Semantic Digital Libraries at ICSD’09

Last week I have travelled to Trento, Italy.

I was invited by the organizers of the International Conference for the Semantic Web and Digital Libraries (ICSD’09) to give a full-day tutorial on Semantic Digital Libraries.

The key goal of this conference is to bring together researchers and practitioners working on solutions that span together these two worlds: Semantic Web and Digital Libraries. Even though these two research lines have so much in common, getting a joint mind set proved to be quite bit of a problem.

(more…)


My PhD Viva Voce (a.k.a PhD thesis defense)

Few months ago (in May) I had to face the most important point on my way to PhD – the viva voce (i.e., “thesis defense”).
I could not get myself around posting about that, mostly due to many activities related to our startup and to the final procedures related to the thesis itself.

Now, as my PhD thesis manuscript has reached the camera ready version, has been copy-edited (thanks Miechelle and Andrzej), printed, hard-bound (thanks Lukasz and Iza), and submitted (thanks Hilda for taking over from there), I can finally devote some free mind-cycles to a not-so-strictly-phd-related matters, e.g., taking care about my blog, which was not “watered” for quite some time.

As it was getting closer and closer to the this point in time, I felt like I had to finally let you all know how it went. So here is the story; I will take from where I left it in May, a week before the viva voce.

We (Ewelina was with me all the way to support and help me) flew in to Cork and then went up to Galway just before the weekend (May 16th-17th). I longed to see all our friends in Galway, but … first things first – I had to polish up my presentation for the viva (thanks Stefan for your valuable comments).
On Saturday we picked up Daniel Schwabe from the Shannon airport and visited the Bunratty Castle near Limerick. On Sunday, my second reviewer – prof. Henryk Krawczyk, arrived in Galway.

(more…)


Propozycje tematow na Jesienne Spotkania PTI

Jak co roku zwracam się do was z prośba o głosowanie na tematy zgłoszone przeze mnie do Jesiennych Spotkań PTI.

(more…)


Next week: my PhD defense (Semantic Digital Libraries )

The day has come to wrap up the research I have been doing for the last couple of years. After closing the write up and submitting my thesis in February this year, my PhD defense is approaching very soon.

Here is some information about it. If you are around Galway next Monday – you are welcome to come and watch my presentation at the open session.

When: May, 18th 2009

Where: Conference Room, DERI, NUI Galway (IDA Business Park, Lower Dangan, Galway, Ireland)

Schedule:

  • 10.00 – 10.30 – Open session
  • 10.30 – 11.30 – Closed session – you & the 3 examiners (fingers crossed!)
  • 11.30 – 12.00 – Closed session – examiners only

Abstract:

Until recently, libraries were the prime source of information for both students and scholars. Now this information is published using online with digital library systems. Current digital libraries have to provide efficient information discovery solutions to adapt to the fast development of new technologies; they also have to cater to the current generation of students. The research on the Semantic Web and online social networks contributes to the digital libraries domain by supporting interoperability with formal semantics, improving interlinking of information and encouraging users to contribute and share knowledge.

Semantic technologies support more flexible information management than that offered by classic digital libraries. Information on library resources can be gathered from heterogeneous sources, including contributions from the communities of library users. These annotations, combined with legacy data, build the foundations for more efficient information discovery in digital libraries.

This thesis reviews architectures, abstract models, metadata standards, and various technologies for building digital library management systems. We derive requirements for advanced digital libraries and propose an architecture model and a set of ontologies for semantic digital libraries. Finally, we present information discovery services using semantic and social technologies, and the prototype implementation of a semantic digital library that fulfills the aforementioned requirements.

Our hypothesis is that semantic and social technologies applied to a digital library management system deliver more efficient information discovery solutions, while the library users become more satisfied and can remember more of the information they have learned when using the library. We present two information discovery services that use semantic and social technologies; we also show a prototype of a semantic digital library. We support our hypothesis by discussing the results of initial evaluations of both services and a comprehensive evaluation of the semantic digital library prototype.


If you are interested in doing a PhD yourself check a collection online PhD programs.


Why do I prefer ntriples?

A thought experiment (actually I had to do that just a minute ago): you have a number of publications backed up from JeromeDL. Each publication is in a separate folder, named as an ID of this publication. Inside you will find dublin core file (XML), couple of binary files (PDFs and such), and RDF description of the resource.

The task: Map a title to each resource using anything you can get on MacOSX or Linux.

Solution: The RDF description in JeromeDL is exported using ntriples format. Which means – one statement per line. Therefore a solution is a very simple workflow:

  1. find the RDF files
  2. prepare grep command
  3. execute

Which on any UNIX system will translate into:
find . -name "rdf.abstract.ntriples" | awk '{print "grep \"xontology#hasTitle\"",$0}' | sh -

Teaser: Try to do that spending only as little time as I did with either RDF/XML serialization or/and Windows. Good luck.


What is keeping me so busy recently? (1)

One of the things missing in my PhD Thesis was a chapter on an architecture of a generic SemDL. But before one can move to SemDL architecture, first you need to understand the research in “classic” DLs, and what are the plans for the future. In order to do that, I had to:

1) Take this pile of almost 100 articles on digital libraries architectures, read them, annotate them


DL Arch (source)
Originally uploaded by skruk

2) Compile all this information into a mindmap


DL Arch (mindmap)
Originally uploaded by skruk

3) Write up the results into a missing chapter in my thesis :)


Why social Internet in Poland might not be a good idea ?

Before I came to DERI, I was working on the semantic digital library project called Elvis-DL (now JeromeDL). My idea was to add a feature allowing registered users to share their opinion on the resources in the digital library, and hence, build up the knowledge around it.

The response I got at that time was – that it would be a bad ideas – as free comments will lead only to “spam” texts, or worse – comments expressed with bad language, etc. In other words useless.

Shortly after I started developing social features for JeromeDL in DERI, I have also added this “blog-around” feature. Everyone was pleased.

Why?

Because in the social Internet around the world, your comments on blogs present who you are, and people tend to express themselves politely. Hence, their comments are if not useful, at least nice to the others.

Compare comments on Flickr with comments on wp.pl (large polish information portal). Huuge difference, right?

I am in the process of evaluating my solutions (both semantic and social) for digital libraries; these that have been implemented in JeromeDL. I have sent information around, using any possible information channels I could think about. Most of them were social networking sites like Facebook, LinkedIn, GoldenLine; and mailing lists of all known to me social and semantic web groups.

One of my colleagues, helping me with the evaluation, suggested I should also send this information to the new and fast growing social networking site called nasza-klasa.pl.

I have to confess, I was not too much convinced to this idea; but since I wanted more people to help me, reluctantly, I have sent the information also there. The fora of my universities (GUT and NUIG) and the high school.

Shortly after, a very stupid (and frustrated, if you ask me) comment appeared at the GUT forum.

For me it is a clear example that social solutions build in Poland cannot be left without moderation.

Sad, isn’t it?


—

16 January 2008 17:46

As an after match of this stupid conversation, sadly enough supported by other people with strange attitude, and having no answer from the moderators of the system, I have decided to do the only reasonable thing – remove my profile from that site, since it was impossible to continue the conversation as it went.

Shame on you nasza-klasa.pl.

16 January 2008 23:59

Maciej’ve just sent me a copy of the conversation that continues on nasza-klasa.pl. It is nice to see that other people share my understanding of “cultural” conversation.
I could re-register to the portal – but I will not do that.
Consider this my protest against zero-reaction from moderators.
I do not claim that people who behave wrong should be removed out of the sudden, but tuning in to close a pointless conversation would be enough.
As Jaroslaw said – I removed my account as being part of the social network means for me to identify with the people there, and use the SN for the purpose. If I cannot talk to other people without being abused for using technical language, what is the purpose of using such a SN? Good luck nasza-klasa – but you just lost a very strong supporter of social networking. I am getting back to Facebook, LinkedIn, and GoldenLine – see you all there.

17 January 2008 09:28

As I can see – the discussion continues. This time the feather was taken by some weird guy who cannot even spell “evaluations” – congratulations – how such a person can be subscribed to university forum? The most funny is that he claims that my evaluation was set up only for computer science; well, it was not. It was set up for people who have some more understanding of the current Internet. The truly sad is that the people who felt offended by not understanding my post were actually computer scientists (or so they claim). And of course, he could not write even a short post without using “french” (yeah, I still know what “chgw” means)


Evaluation of social and semantic technologies in digital libraries

I would like to invite everyone to take part in the evaluation of the semantic and social technologies for digital libraries. The evaluation benchmarks search and browsing solutions delivered in our semantic digital library called JeromeDL [http://www.jeromedl.org/] against standard services offered by one of most popular open-source libraries – DSpace.

Please feel free to enter http://q.digime.name/ and help us with evaluating our prototype solutions.


Looking for a Researcher – Software Developer [JeromeDL project] (position closed)

Researcher – Software Developer

(please note that the recruitment process is now closed)

The Digital Enterprise Research Institute (DERI) is the largest semantic research organisation in the world. DERI offers a stimulating, dynamic, multi-cultural research environment with excellent ties to research groups worldwide. This is a unique opportunity to join the effort of bringing research prototypes to industry ready within DERI, in collaboration with our research and industrial partners will play a key role in making next-generation semantic computing systems a reality. DERI offers a unique opportunity to develop one’s career in the world-wide renown and industry strong research environment.
 
The Person
• Ability and willingness to work in a international team based environment developing state of the art software solutions on time and to specification
• Motivated and proactive attitude to take ownership and initiative in all work assignments
• Excellent analysis and problem solving skills
• Strong design, development & testing skills
• Excellent communication skills, verbal and written
• Excellent command of English, both verbal and written
• Ability to tackle wide and varied tasks
• Creative Thinking

Essential Skills
• Solid industry experience using many of the following:
• Very strong core Java
• Web based UI: JSP/Servlets/Applets/JavaScript/AJAX
• Good expertise with automated testing frameworks such as JUnit
• Good knowledge and experience with Semantic technologies
• Good knowledge of object-oriented design principles and design patterns with an understanding of their application within Java
 
Desirable Experience & Background (inc. qualifications):
• Knowledge/Experience with distributed systems and service-oriented design principles
• Knowledge of user interface design principles
• Experience/Knowledge of document processing and search techniques
• Experience/Knowledge of XML processing and related technologies
• A relevant post graduate degree (MSc) or relevant industrial experience

The position is full-time, located at DERI Galway. The duration of the post will be for 9 months in the first instance. The salary is commensurable with qualifications and experience. An early start date is preferable as the position is now open. A panel for future similar positions may be formed.
Informal enquiries about these positions may be made to:
Sebastian Ryszard Kruk, Researcher and Project Manager, Tel +353-91-495213
sebastian.kruk@deri.org

Application procedure: Candidates are requested to submit a covering letter, CV (Word or PDF format only) and the names and addresses of at least three and not more than five referees via e-mail to;
hr.ie@deri.org


Ontology Development – Do we collaborate?

One of the still unresolved problems in MarcOnt Portal is how to integrate suggestions from the community into new release of the ontology; which suggestions will conflict, how to choose the ones to be used, etc. MarcOnt Portal group still fights with the code base: switching to new SemVersion, fighting with FOAFRealm+SemVersion integration, and hardening the implementation. And we keep forgetting about that question – how to design and algorithm for semi-automated agreement on new versions of ontologies.

I guess, we should start with the definition of the ontology:

An ontology is a specification of a conceptualization. (…) Practically, an ontological commitment is an agreement to use a vocabulary (i.e., ask queries and make assertions) in a way that is consistent (but not complete) with respect to the theory specified by an ontology.
Tom Gruber “What is an Ontology?”

In other words – ontology is should be based on the agreement among the community of experts in the specific domain.

Should we try to talk to as many domain experts from various ontology development groups as possible, and see how they do that in practice?

There is some research done in that area already; have following article as an example.

But, when we look into an average ontology development – is it always based on the community agreement? I hope we will figure it out soon, as it is the cornerstone of our future research on MarcOnt Portal.


What is the difference?

The concerns around RDF Storages efficient are not new; many people I meet, ask me if they are scalable enough, so that they could used them in the industrial solutions.
I was not sure about it for a long time. I was not happy with Jena, we have switched JeromeDL and FOAFRealm to Sesame. I showed some improvement. I was hoping to switch to YARS, but being unable to write to this storage kept me at bay.
Anyway, over the time my confidence in the scalability of the RDF storages grew. When DERI announced the break through with SWSE/YARS2, I felt pretty confident that we have reached the stage, where the industrial world can start building upon Semantic Web technologies.

And so, I became reckless. Until only recently …

During my summer holiday, just to play around a little, I did some changes in the TagsTreeMaps (TTM) component, preparing it for the evaluation, which I will need for my thesis. Since broadband connection and a sunny environment are mutually exclusive (at least they were in my case), I have switched from the original del.icio.us tagging provider module, developed last year, to an internal notitio.us provider module. The later one operated on the RDF storage (Sesame) with a copy of my, and some of my colleagues, taggings from del.icio.us. The graph with taggings was build following Tom Grubbers Tagging ontology (TagCommons).

If you happened to play with TTM anytime in the past, you know that what is required in the first step is a list of all tags by given user, with a number of times each tag has been used. Since none of RDF query languages (at least to my knowledge), supported by Sesame, allows for aggregations like COUNT(*), I decided to do the counting myself. Still, I needed a list of all tags.

The obvious, to me, query was following:

SELECT term
FROM
{document} tagging:hasTagging {tagging},
{tagging} dc:creator {<USER-ID>};
tagging:hasTerm {} rdfs:label {term}
USING NAMESPACE
tagging = <http://ttm.corrib.org/tagging#>,
dc = <http://purl.org/dc/elements/1.1/>

In other words, for all documents tagged by user with give USER-ID, get all literals representing tags used in this tagging.

To my surprise the whole application slowed downed to a snail pace. Why? A quick profiling with Logger in the right places of the algorithm (I could not get Tomcat profilers in Eclipse running on my Mac), gave a hint that it is the query execution by Sesame that takes ages.
I have even posted this query through the web interface of Sesame. The result was even worse: 25k ms (!) to compute the query for roughly 400+ documents with 2.5 tags per each (on average). That is BAD.

Luckily, I am blessed with a group of smarter than me (apparently) people working under my supervision in my SemInf Lab in DERI.
I told the problem to Maciej, and asked him what question would he wrote. His response was:

SELECT term
FROM
{tagging} dc:creator {<USER-ID>};
tagging:hasTerm {} rdfs:label {term}
USING NAMESPACE
tagging = <http://ttm.corrib.org/tagging#>,
dc = <http://purl.org/dc/elements/1.1/>

… and Sesame managed to compute it, giving the same results (!) in 200ms (!!!!!).

The question is if I can use his query instead of mine? Quick answer: YES,

… but what if the RDF will not conform our ontology? Like e.g., there will be resources with dc:creator and tagging:hasTerms properties, where will not be of a type Tagging, associated with a document? Unlikely to happen in the old world of SQL, but not in the open Semantic Web environment.

For the purpose of the evaluation of TTM I will stick to Maciej’s query. Hopefully, there will be some better solution out there, by the time notitio.us will go commercial.


How I love Tomcat (did I say love? I hate it)

This is not the first time that Tomcat team decided to make our life easier and change the way Tomcat 6 works (compared to T5.5)

A couple of notes from our (just finished) session on how to make JeromeDL working on T6.

  1. apart from small changes required here and there in JSP (like changing ${ (test)?one:two} -> ${ (test)?(one):(two)} ) – T6 seems to be much faster than T5.5
  2. T6 introduced new way (they say it is a features) of handling internationalization, but it breaks common sense way of how fmt:bundle worked. Now, you cannot do .getKeys(), or bundle.keys – as this new object, that says it is a ResourceBundle is somehow mapped in EL to something behaving like a Map. so bundle.keys – returns ???keys??? indicating that such a translation has not been found – stupid.

    Together with Adam we wrote a helper function in Tag Lib to make sure we get Enumeration from bundle – it was required by JavaScript internationalization style we have e.g. in SSCF

  3. T6 has problems with handling long URL that contains URLEncoded fragments. If you have %2F as a result of URL encoding a slash – it will fail to load the page, with error 400 -> wrong URL no Slash – again stupid. I will try to find some solution soon.

Please let me know if anyone has any idea how to fix point 3.

Technorati Tags:
, , ,


Recreational computing

There is so much going on recently in DERI/eLITE/Corrib – so many things I would like to write about … but I can’t. At least until all those strange IP policies will get firm borders stating what we researchers can tell and what we should not tell.

Until then, trying to avoid any NUIG IP policies land mines I might stuble upon, I can only tell that I came back to unlimited-fun-generating activities: research & developement. I have managed attract a group of skillfull researchers to take care about each of the projects I set up some time ago: JeromeDL (Tomasz), FOAFRealm and HyperCuP (Sławek), MarcOnt and a very new one S3B (Adam).

Now I can relax slightly from some the management responsibilities and spend some time on “recreational computing“. For me it is a combination of all the I really like: maths, user interface design, web programming/prototyping, and … inventing.

I hope I will be allowed to publish some of my recent ideas and prototypes. Untill that time, unless you are from NUIG/DERI, sorry … you got to trust me – I am having great fun (although it employs working 10-12h/day)