Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 14 2015

15:37

SNB Interactive, Part 2 - Modeling Choices

SNB Interactive is the wild frontier, with very few rules. This is necessary, among other reasons, because there is no standard property graph data model, and because the contestants support a broad mix of programming models, ranging from in-process APIs to declarative query.

In the case of Virtuoso, we have played with SQL and SPARQL implementations. For a fixed schema and well known workload, SQL will always win. The reason is that SQL allows materialization of multi-part indices and data orderings that make sense for the application. In other words, there is transparency into physical design. An RDF/SPARQL-based application may also have physical design by means of structure-aware storage, but this is more complex and here we are just concerned with speed and having things work precisely as we intend.

Schema Design

SNB has a regular schema described by a UML diagram. This has a number of relationships, of which some have attributes. There are no heterogenous sets, i.e., no need for run-time typed attributes or graph edges with the same label but heterogenous end-points. Translation into SQL or SPARQL is straightforward. Edges with attributes (e.g., the foaf:knows relation between people) would end up represented as a subject with the end points and the effective date as properties. The relational implementation has a two-part primary key and the effective date as a dependent column. A native property graph database would use an edge with an extra property for this, as such are typically supported.

The only table-level choice has to do with whether posts and comments are kept in the same or different data structures. The Virtuoso schema uses a single table for both, with nullable columns for the properties that occur only in one. This makes the queries more concise. There are cases where only non-reply posts of a given author are accessed. This is supported by having two author foreign key columns each with its own index. There is a single nullable foreign key from the reply to the post/comment being replied to.

The workload has some frequent access paths that need to be supported by index. Some queries reward placing extra columns in indices. For example, a common pattern is accessing the most recent posts of an author or a group of authors. There, having a composite key of ps_creatorid, ps_creationdate, ps_postid pays off since the top-k on creationdate can be pushed down into the index without needing a reference to the table.

The implementation is free to choose data types for attributes, particularly datetimes. The Virtuoso implementation adopts the practice of the Sparksee and Neo4j implementations and represents this is a count of milliseconds since epoch. This is less confusing, faster to compare, and more compact than a native datetime datatype that may or may not have timezones, etc. Using a built-in datetime seems to be nearly always a bad idea. A dimension table or a number for a time dimension avoids the ambiguities of a calendar or at least makes these explicit.

The benchmark allows procedurally maintained materializations of intermediate results for use by queries as long as these are maintained transaction-by-transaction. For example, each person could have the 20 newest posts by their immediate contacts precomputed. This would reduce Q2 "top of the wall" to a single lookup. This does not however appear to be worthwhile. The Virtuoso implementation does do one such materialization for Q14: A connection weight is calculated for every pair of persons that know each other. This is related to the count of replies by either to content generated by the other. If there does not exist a single reply in either direction, the weight is taken to be 0. This weight is precomputed after bulk load and subsequently maintained each time a reply is added. The table for this is the only row-wise structure in the schema and represents a half-matrix of connected people, i.e., person1, person2 -> weight. Person1 is by convention the one with the smaller p_personid. Note that comparing IDs in this way is useful but not normally supported by SPARQL/RDF systems. SPARQL would end up comparing strings of URIs with disastrous performance implications unless an implementation-specific trick were used.

In the next installment, we will analyze an actual run.

15:37

SNB Interactive, Part 1 - What is SNB Interactive Really About?

This is the first in a series of blog posts analyzing the Interactive workload of the LDBC Social Network Benchmark. This is written from the dual perspective of participating in the benchmark design, and of building the OpenLink Virtuoso implementation of same.

With two implementations of SNB Interactive at four different scales, we can take a first look at what the benchmark is really about. The hallmark of a benchmark implementation is that its performance characteristics are understood; even if these do not represent the maximum of the attainable, there are no glaring mistakes; and the implementation represents a reasonable best effort by those who ought to know such, namely the system vendors.

The essence of a benchmark is a set of trick questions or "choke points," as LDBC calls them. A number of these were planned from the start. It is then the role of experience to tell whether addressing these is really the key to winning the race. Unforeseen ones will also surface.

So far, we see that SNB confronts the implementor with choices in the following areas:

  • Data model — Tabular relational (commonly known as SQL), graph relational (including RDF), property graph, etc.

  • Physical storage model — Row-wise vs. column-wise, for instance.

  • Ordering of materialized data — Sorted projections, composite keys, replicating columns in auxiliary data structures, etc.

  • Persistence of intermediate results —  Materialized views, triggers, precomputed temporary tables, etc.

  • Query optimization — join order/type, interesting physical data orderings, late projection, top k, etc.

  • Parameters vs. literals — Sometimes different parameter values result in different optimal query plans.

  • Predictable, uniform latency — Measurement rules stipulate the the SUT (system under test) must not fall behind the simulated workload.

  • Durability — How to make data durable while maintaining steady throughput, e.g., logging, checkpointing, etc.

In the process of making a benchmark implementation, one naturally encounters questions about the validity, reasonability, and rationale of the benchmark definition itself. Additionally, even though the benchmark might not directly measure certain aspects of a system, making an implementation will take a system past its usual envelope and highlight some operational aspects.

  • Data generation — Generating a mid-size dataset takes time, e.g., 8 hours for 300G. In a cloud situation, keeping the dataset in S3 or similar is necessary; re-generating every time is not an option.

  • Query mix — Are the relative frequencies of the operations reasonable? What bias does this introduce?

  • Uniformity of parameters — Due to non-uniform data distributions in the dataset, there is easily a 100x difference between "fast" and "slow" cases of a single query template. How long does one need to run to balance these fluctuations?

  • Working set — Experience shows that there is a large difference between almost-warm and steady-state of working set. This can be a factor of 1.5 in throughput.

  • Reasonability of latency constraints — In the present case, a qualifying run must have no more than 5% of all query executions starting over 1 second late. Each execution is scheduled beforehand and done at the intended time. If the SUT does not keep up, it will have all available threads busy and must finish some work before accepting new work, so some queries will start late. Is this a good criterion for measuring consistency of response time? There are some obvious possibilities for abuse.

  • Ease of benchmark implementation/execution — Perfection is open-ended and optimization possibilities infinite, albeit with diminishing returns. Still, getting started should not be too hard. Since systems will be highly diverse, testing that these in fact do the same thing is important. The SNB validation suite is good for this and, given publicly available reference implementations, the effort of getting started is not unreasonable.

  • Ease of adjustment — Since a qualifying run must meet latency constraints while going as fast as possible, setting the performance target involves trial and error. Does the tooling make this easy?

  • Reasonability of durability rule — Right now, one is not required to do checkpoints but must report the time to roll forward from the last checkpoint or initial state. Inspiring vendors to build faster recovery is certainly good, but we are not through with all the implications. What about redundant clusters?

The following posts will look at the above in light of actual experience.

May 05 2015

15:13

Thoughts on KOS (Part 3): Trends in knowledge organization

The accelerating pace of change in the economic, legal and social environment combined with tendencies towards increased decentralization of organizational structures have had a profound impact on the way we organize and utilize and organize knowledge. The internet as we know it today and especially the World Wide Web as the multimodal interface for the presentation and consumption of multimedia information are the most prominent examples of these developments. To illustrate the impact of new communication technologies on information practices Saumure & Shiri (2008) conducted a survey on knowledge organization trends in the Library and Information Sciences before and after the emergence of the World Wide Web. Table 1 shows their results.

kos trends

 

 

 

 

 

 

 

The survey illustrates three major trends: 1) the spectrum of research areas has broadened significantly from originally complex and expert-driven methodologies and systems to more light-weight, application-oriented approaches; 2) while certain research areas have kept their status over the years (i.e. Cataloguing & Classification or Machine Assisted Knowledge Organization), new areas of research have gained importance (i.e. Metadata Applications & Uses, Classifying Web Information, Interoperability Issues) while formerly prevalent topics like Cognitive Models or Indexing have declined in importance or dissolved into other areas; and 3) the quantity of papers that are explicitly and implicitly dealing with metadata issues have significantly increased.

These insights coincide with a survey conducted by The Economist (2010) that comes to the conclusion that metadata has become a key enabler in the creation of controllable and exploitable information ecosystems under highly networked circumstances. Metadata provide information about data, objects and concepts. This information can be descriptive, structural or administrative. Metadata adds value to data sets by providing structure (i.e. schemas) and increasing the expressivity (i.e. controlled vocabularies) of a dataset.

According to Weibel & Lagoze (1997, p. 177):

“[the] association of standardized descriptive metadata with networked objects has the potential for substantially improving resource discovery capabilities by enabling field-based (e.g., author, title) searches, permitting indexing of non-textual objects, and allowing access to the surrogate content that is distinct from access to the content of the resource itself.”

These trends influence the functional requirements of the next generation’s Knowledge Organization Systems (KOSs) as a support infrastructure for knowledge sharing and knowledge creation under conditions of distributed intelligence and competence.

Go to previous posts in this series:
Thoughts on KOS (Part1): Getting to grips with “semantic” interoperability or
Thoughts on KOS (Part 2): Classifying Knowledge Organisation Systems

 

References

Saumure, Kristie; Shiri, Ali (2008). Knowledge organization trends in library and information studies: a preliminary comparison of pre- and post-web eras. In: Journal of Information Science, 34/5, 2008, pp. 651–666

The Economist (2010). Data, data everywhere. A special report on managing information. http://www.emc.com/collateral/analyst-reports/ar-the-economist-data-data-everywhere.pdf, accessed 2013-03-10

Weibel, S. L., & Lagoze, C. (1997). An element set to support resource discovery. In: International Journal on Digital Libraries, 1/2, pp. 176-187

15:07

PoolParty 5.1 comes with integrated Graph Search feature

PoolParty GraphSearch SWC has launched PoolParty Semantic Suite Version 5.1, its taxonomy management and knowledge graph management software platform.

Version 5.1 offers several new features including an ontology publishing module and an integrated graph based search application, which shows instantly how changes on the taxonomy will influence search results.

New features of PoolParty 5.1 include:

  • Several updates of 3rd party components used by the PoolParty server, e.g. update of Sesame to version 2.7.14 to gain full SPARQL 1.1 compatibility and provide additional RDF serialization formats (N-Quads, RDF/JSON)
  • GraphSearch per project based on the calculated corpora in corpus management. After successful calculation, a GraphSearch interface is available via a persistent URL, e.g. http://vocabulary.semantic-web.at/PoolParty/graphsearch/cocktails, which is the GraphSearch over a knowledge graph about Cocktails.
  • Unified URI management to support URI creation aligned for projects and custom schemes
  • Enterprise Security: Several measurments have been undertaken to provide highest enterprise security possible
  • Publishing of custom schemes: Similar to the linked data frontend for projects, a schema publishing functionality for custom schemes has been added. A human readable version of the scheme is displayed per default when accessing the schema URL in a browser. As an example, take a look at the Cocktail ontology.

PoolParty Custom Schema Publishing

Find the detailed Release Notes in our online documentationx

 

 

April 28 2015

09:36

Our semantic event recommendations

Business conference

Just a couple of years ago critics argued that the semantic approach in IT wouldn’t make the transformation from an inspiring academic discipline to a relevant business application. They were wrong! With the digitalization of business, the power of semantic solutions to handle Big Data became obvious.

Thanks to a dedicated global community of semantic technology experts, we can observe a rapid development of software solutions in this field. The progress is coupled to a fast growing number of corporations that are implementing semantic solutions to win insights from existing but unused data.
Knowledge transfer is extremely important in semantics. Let`s have a look on the community calendar for the upcoming months. We are looking forward to share our experiences and learn. Join us!

Check out the semantic technology event calender

April 27 2015

13:51

SWC’s Semantic Event Recommendations

Just a couple of years ago critics argued that the semantic approach in IT wouldn’t make the transformation from an inspiring academic discipline to a relevant business application. They were wrong! With the digitalization of business, the power of semantic solutions to handle Big Data became obvious.Thanks to a dedicated global community of semantic technology experts, we can observe a rapid development of software solutions in this field. The progress is coupled to a fast growing number of corporations that are implementing semantic solutions to win insights from existing but unused data.

Knowledge transfer is extremely important in semantics. Let`s have a look on the community calendar for the upcoming months. We are looking forward to share our experiences and learn. Join us!

>> Semantics technology event calendar

 

13:03

Bernhard Haslhofer is the new Chief Data Scientist at SWC

Bernhard Haslhofer about his motivation to work as advisor for Semantic Web Company

Being a researcher by training, it is my job to know the state of the art and to make significant and original contributions in my research field. Understanding and keeping at least in pace with technological developments is certainly challenging but also a major motivation for this job.

In the field of computer science it is common practice to validate and/or demonstrate novel techniques by writing papers and implementing software prototypes. Even though many of those prototypes offer innovative and novel features, they often remain hidden within the scientific community because of lacking long-term support or missing market knowledge and business skills. Turning research-driven innovation into products therefore requires innovative enterprises that can offer those complementary skills and are open to novel technological approaches.

I strongly believe that a tight cooperation between people from academia and industry brings mutual benefits for both sides: research-driven innovation for enterprises as well as valuable real-world feedback loop for academia.

In recent years, people at SWC have already demonstrated awareness and a high level of openness to novel ideas and developments in academia (e.g., Linked Data) and, above all, showed how those ideas can successfully be transformed into products and business. In my new role as Chief Data Scientist at SWC I am looking forward to further support research-driven innovation by questioning the status quo and identifying concrete steps to improve product features, with the overall goal of getting better in what we do.

 

Short Bio Bernhard Haslhofer

Dr. Bernhard Haslhofer is working as a Data Scientist at the Austrian Institute of Technology. His research interest lies in gaining insights from large-scale and connected datasets by applying machine learning, information retrieval, and network analytics techniques. Previously, Bernhard worked as post doctoral researcher and lecturer at Cornell University Information Science, and received a Ph.D. in Computer Science from University of Vienna. He has numerous Linked Data related publications, serves in several related program committees, and is a recipient of an EU Marie Curie Fellowship and several research awards.

April 21 2015

15:06

Thoughts on KOS (Part 2): Classifying Knowledge Organisation Systems

Traditional KOSs include a broad range of system types from term lists to classification systems and thesauri. These organization systems vary in functional purpose and semantic expressivity. Most of these traditional KOSs were developed in a print and library environment. They have been used to control the vocabulary used when indexing and searching a specific product, such as a bibliographic database, or when organizing a physical collection such as a library (Hodge et al. 2000).

KOS in the era of the Web

With the proliferation the World Wide Web new forms of knowledge organization principles emerged based on hypertextuality, modularity, decentralisation and protocol-based machine communication (Berners-Lee 1998). New forms of KOSs emerged like folksonomies, topic maps and knowledge graphs, also commonly and broadly referred to as ontologies[1].

With reference to Gruber’s (1993/1993a) classic definition:

“a common ontology defines the vocabulary with which queries and assertions are exchanged among agents” based on “ontological commitments to use the shared vocabulary in a coherent and consistent manner.”

From a technological perspective ontologies function as integration layer for semantically interlinked concepts with the purpose to improve the machine-readability of the underlying knowledge model. Ontologies leverage interoperability from a syntactic to a semantic level for the purpose of knowledge sharing. According to Hodge et al. (2003)

“semantic tools emphasize the ability of the computer to process the KOS against a body of text, rather than support the human indexer or trained searcher. These tools are intended for use in the broader, more uncontrolled context of the Web to support information discovery by a larger community of interest or by Web users in general.” (Hodge et al. 2003)

In other words ontologies are being considered valuable to classifying web information in that they aid in enhancing interoperability – bringing together resources from multiple sources (Saumure & Shiri 2008, p. 657).

Which KOS serves your needs?

Schaffert et al. (2005) introduce a model to classify ontologies balong their scope, acceptance and expressivity, as can be seen in the figure below.

kos

 

According to this model the design of KOSs has to take account of the user group (acceptance model), the nature and abstraction level of knowledge to be represented (model scope) and the adequate formalism to represent knowledge for specific intellectual purposes (level of expressiveness). Although the proposed classification leaves room for discussion, it can help to distinguish various KOSs from each other and gain a better insight into the architecture of functionally and semantically intertwined KOSs. This is especially important under conditions of interoperability.

[1] It must be critically noted that the inflationary usage of the term “ontology” often in neglect of its philosophical roots has not necessarily contributed to a clarification of the concept itself. A detailed discussion of this matter is beyond the scope of this post. In this paper the author refers to Gruber’s (1993a) definition of ontology as “an explicit specification of a conceptualization”, which is commonly being referred to in artificial intelligence research.

The next post will look at trends inknowledge organization before and after the emergence of the world wide web.

Go to the previous post:Thoughts on KOS (Part1): Getting to grips with “semantic” interoperability

References:

Gruber, Thomas R. (1993). Toward Principles for the Design of Ontologies Used for Knowledge Sharing. In International Journal Human-Computer Studies 43, pp. 907-928.

Gruber, Thomas R. (1993a). A translation approach to portable ontologies. In: Knowledge Acquisition, 5/2, pp. 199-220

Hodge, Gail (2000). Systems of Knowledge Organization for Digital Libraries: Beyond Traditional Authority Files. In: First Digital Library Federation electronic edition, September 2008. Originally published in trade paperback in the United States by the Digital Library Federation and the Council on Library and Information Resources, Washington, D.C., 2000

Hodge, Gail M.; Zeng, Marcia Lei; Soergel, Dagobert (2003). Building a Meaningful Web: From Traditional Knowledge Organization Systems to New Semantic Tools. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL’03), IEEE

Saumure, Kristie; Shiri, Ali (2008). Knowledge organization trends in library and information studies: a preliminary comparison of pre- and post-web eras. In: Journal of Information Science, 34/5, 2008, pp. 651–666

Schaffert, Sebastian; Gruber, Andreas; Westenthaler, Rupert (2005). A Semantic Wiki for Collaborative Knowledge Formation. In: Reich, Siegfried; Güntner, Georg; Pellegrini, Tassilo; Wahler, Alexander (Eds.). Semantic Content Engineering. Linz: Trauner, pp. 188-202

April 10 2015

12:44

Thoughts on KOS (Part1): Getting to grips with “semantic” interoperability

Enabling and managing interoperability at the data and the service level is one of the strategic key issues in networked knowledge organization systems (KOSs) and a growing issue in effective data management. But why do we need “semantic” interoperability and how can we achieve it?

Interoperability vs. Integration

The concept of (data) interoperability can best be understood in contrast to (data) integration. While integration refers to a process, where formerly distinct data sources and their representation models are being merged into one newly consolidated data source, the concept of interoperability is defined by a structural separation of knowledge sources and their representation models, but that allows connectivity and interactivity between these sources by deliberately defined overlaps in the representation model. Under circumstances of interoperability data sources are being designed to provide interfaces for connectivity to share and integrate data on top of a common data model, while leaving the original principles of data and knowledge representation intact. Thus, interoperability is an efficient means to improve and ease integration of data and knowledge sources.

Three levels of interoperability

When designing interoperable KOSs it is important to distinguish between structural, syntactic and semantic interoperability (Galinski 2006):

  • Structural interoperability is achieved by representing metadata using a shared data model like the Dublin Core Abstraction Model or RDF (Resource Description Framework).
  • Syntactic interoperability if achieved by serializing data in a shared mark-up language like XML, Turtle or N3.
  • Semantic interoperability is achieved by using a shared terminology or controlled vocabulary to label and classify metadata terms and relations.

Given the fact that metadata standards carry a lot of intrinsic legacy, it is sometimes very difficult to achieve interoperability at all three levels mentioned above. Metadata formats and models are historically grown, they are most of the time a result of community decision processes, often highly formalized for specific functional purposes and most of the time deliberately rigid and difficult to change. Hence it is important to have a clear understanding and documentation of the application profile of a metadata format as a precondition for enabling interoperability at all three levels mentioned above. Semantic Web standards do a really good job in this respect!!

In the next post, we will take a look at various KOSs and how they differ with trespect to expressivity, scope and target group.

April 09 2015

08:37

Transforming music data into a PoolParty project

Goal

For the Nolde project it was requested to build a knowledge graph, containing detailed information about the austrian music scene: artists, bands and their music releases. We decided to use PoolParty, since theses entities should be accessible in an editorial workflow. More details about the implementation will be provided in a later blog post.

In the first round I want to share my experiences with the mapping of music data into SKOS. Obviously, LinkedBrainz was the perfect source to collect and transform such data since this is available as RDF/NTriples dumps and even providing a SPARQL endpoint! LinkedBrainz data is modeled using the Music Ontology.

E.g. you can select all mo:MusicArtists with relation to Austria.

SELECT query

I imported LinkedBrainz dump files and imported them into a triple store, together with DBpedia dumps.

With two CONSTRUCT queries, I was able to collect the required data and transform it into SKOS, into a PoolParty compatible format:

Construct Artists

CONSTRUCT Artists#1

Screen Shot 2015-04-10 at 10.53.36

Every matching MusicArtist results in a SKOS concept. The foaf:name is mapped to skos:prefLabel (in German).

As you can see, I used Custom Schema features to provide self-describing metadata on top of pure SKOS features: a MusicBrainz link, a MusicBrainz Id, DBpedia link, homepage…

In addition you can see in the query that also data from DBpedia was collected. In case a owl:sameAs relationship to DBpedia exists, a possible abstract is retrieved. When a DBpedia abstract is available it is mapped to skos:definition.

Construct Releases (mo:SignalGroups) with relations to Artists

Screen Shot 2015-04-10 at 10.59.50

Screen Shot 2015-04-10 at 11.00.10

Similar to the Artists, a matching SignalGroup results in a SKOS Concept. A skos:related relationship is defined between an Artist and his Releases.

Outcome

The SPARQL construct queries provided ttl files that could by imported directly into PoolParty, resulting in a project, containing nearly 1,000 Artists and 10,000 Releases:

PoolParty thesaurus

April 01 2015

13:06

Bazinga! Minutes from the CKAN Association Steering Group – 1 April (no joke)

Readme.txt

The following minutes represent what the Steering Group discussed today but please remember its also just a meeting (context: no real work is ever done in a meeting). The objective is to discuss and assign actions when needed, to make decisions when needed and to generally align everyone in the various ways each member is already supporting the CKAN project. Reading between the lines of this update there are a few points to call out and make mention of.

  1. The Steering Group (SG) are renewed with energy and determination. While the last meeting might have been some time ago we have set ourselves the objective of meeting weekly (after next week) because it is clear that the CKAN project is advancing rapidly and support from the SG needs to align with the velocity of the project without any risk of holding it back. Let’s add some buzzwords and suggest that the SG is aiming to bootstrap the project and intersect on multiple vectors to achieve maximum lift via regular and meaningful engagement with its project stakeholders (Please don’t take that last sentence seriously).
  2. ‘Distill out a 1-3 pager’ in relation to the business plan means getting lean and putting focus on the most essential parts of the CKAN Association business plan. Long docs with much wordage are great in some situations but in the case of the CKAN project we have an avid community of exceptionally bright people who are fine with the key objectives, strategies and tactics put forward in the most succinct way possible.
  3. If there is to be an operating model for the SG then it will be this: Say what is going to get done. Get it done+. Let everyone know it is done.
  4. Some awesome questions are answered at the end of this post.

+ In some cases things might not actually get done but we will strive to do the best we can. Yes, we’ll be transparent with goals. Yes, we’ll be happy to take any and all feedback. Yes, we are working for the CKAN project and are ultimately governed via public peer review by the project’s community.

CKAN Association Steering Group Meeting 1 April 2015

  • Present: Ashley Casovan (Chair), Steven De Costa, Rufus Pollock (Secretary)
  • Apologies: Antonio Acuna

Minutes

  1. Steering Group Goals (next quarter)
    1. Announce more clearly existence and purpose of Steering Group
      1. steering group email alias: steering-group@ckan.org (goes to group)
    2. Announce objectives which are
      1. Finalise business plan (have now had out for consultation for some time)
        1. Distill out a 1-3 pager
        2. Finalise and announce to list
        3. hangout on air to announce
      2. Community meetings
        1. Technical team run one at least one (general) developer community meeting in next 2m
        2. At least one users community meeting in next 2m
  2. Responsibilities of the SG
    1. Like a board – see http://ckan.org/about/steering-group/
    2. Similarities to Drupal Board: job is to support the community in moving the project forward – self-determination
  3. Review Actions
    1. https://trello.com/b/D6zxiuFJ/ckan-association-steering-group – primarily business plan and response to questions [note this Trello board is private]
  4. CKAN Event at IODC
    1. CIOs and CTO – CKAN is part of national and regional infrastructure
    2. efficiency gains on open data
    3. https://github.com/ckan/ideas-and-roadmap/issues/120
    4. Technical capability
  5. Review of student position description
    1. Ashley to send out to SG members for comment
  6. Meeting schedule: SG will meet weekly (for present) every Thursday at 12 noon UK (for 30m)
  7. Publishing minutes from this meeting – will aim to send asap

Your questions answered

Q: Is the SG interested in increasing transparency of the SG meetings? How will this be achieved?

SG: Yes. This was discussed and we would like to propose the next meeting be run in two parts. One part will be closed to attend to some regular business of the group with regard to coordinating efforts. The second half will be broadcast as a Hangout on Air for people to watch. We’ll aim to collect questions ahead of the meetings and address them during the broadcast with further options for an active Q&A session from the audience.

Q: Have the SG determined whether members of the association are yet contributing funds, or developers to the project? What are they? What happens if members don’t?

SG: There is ongoing work in this area. Most members are contributing in-kind (not exclusively developers). We’d be happy to make the pledged contributions public via the members listing on CKAN.org. At this time it is an honour system with regard to meeting membership obligations that are provided in-kind. If a member is suspected of not providing the expected level of in-kind contribution then the Steering Group will investigate and consider appropriate actions upon conclusion of such investigations.

Q: How does the SG see its role with respect to providing direction for the project?

SG: Support the community of both technical stakeholders and users in ways which allow them to act in concert to move the project forward in the direction these stakeholders determine to be best for the project.

Q: How is the SG raising more funds, other than membership, to further fund development of CKAN?

SG: This is a question the Steering Group is working through currently. Our focus is on the Business Plan and putting strategic objectives down for all to see via that document. Grant applications and the coordination of requirements to meet the needs of a group of platform owners is also being considered. With the latter the proposed approach is to release an expression of interest for funding support against specific development activities. Those who highly value such activities would be asked to help contribute to a pool of funds that would then see the development work paid for.

March 31 2015

05:13

CKAN Association Steering Group – about to set sail!

boatThe CKAN Association Steering Group will be meeting in about 30 hours from now. I wanted to make sure we took the opportunity to ask for community questions regarding the CKAN project.

So, please comment here with any questions you might like discussed and/or answered by the steering group :)

This will be my first chance to catch up with everyone in the group so I will have lots of questions of my own. I’m also keen to provide updates on how I see things are going with regard to developing and extending the CKAN community and its reach with regard to communications activities. We have a modest starting point, so updates will be easy to provide. It would be great to get comments via this post on what more people would like to see. However, there are many action items incomplete from within the Community and Communications Team so I’ll also be reporting on that. We don’t yet have a list of CKAN vendors and this is clearly needed based on the number of CKAN Dev list requests regarding upgrade questions when planning a move to 2.3.

Some great positive indicators I see for the project are the number of people active on the CKAN Dev email list and the high volume of quality conversations that are taking place there. It appears the the 2.3 release has been the catalyst needed for a fantastic reinvestment (at least publicly) from both the regular technical team members and the wider community of awesome people doing amazing stuff within their own open data projects. I would like those on the steering group to recognise this change and actively work to support ###MOAR###!

As a new member of the steering group I should introduce myself. You can see the bio attached to this post but for a fresh video-cast of something I’m involved in within my local area you can also take a look at the Australian Open Knowledge Chapter Board meeting that was held earlier today. The video is embedded below. I do actually mention the work I’ve been doing within the CKAN association at some point so please excuse the ‘inception’-like self referential nature of all this.

The main message here is – steering group meeting in about 30 hours. Please comment on this post to amplify your voice within that forum.

Rock on! Steven

 

March 20 2015

12:05

Presenting public finance just got easier

mexico_ckan_openspending

CKAN 2.3 is out! The world-famous data handling software suite which powers data.gov, data.gov.uk and numerous other open data portals across the world has been significantly upgraded. How can this version open up new opportunities for existing and coming deployments? Read on.

One of the new features of this release is the ability to create extensions that get called before and after a new file is uploaded, updated, or deleted on a CKAN instance.

This may not sound like a major improvement  but it creates a lot of new opportunities. Now it’s possible to analyse the files (which are called resources in CKAN) and take them to new uses based on that analysis. To showcase how this works, Open Knowledge in collaboration with the Mexican government, the World Bank (via Partnership for Open Data), and the OpenSpending project have created a new CKAN extension which uses this new feature.

It’s actually two extensions. One, called ckanext-budgets listens for creation and updates of resources (i.e. files) in CKAN and when that happens the extension analyses the resource to see if it conforms to the data file part of the Budget Data Package specification. The budget data package specification is a relatively new specification for budget publications, designed for comparability, flexibility, and simplicity. It’s similar to data packages in that it provides metadata around simple tabular files, like a csv file. If the csv file (a resource in CKAN) conforms to the specification (i.e. the columns have the correct titles), then the extension automatically creates the Budget Data Package metadata based on the CKAN resource data and makes the complete Budget Data Package available.

It might sound very technical, but it really is very simple. You add or update a csv file resource in CKAN and it automatically checks if it contains budget data in order to publish it on a standardised form. In other words, CKAN can now automatically produce standardised budget resources which make integration with other systems a lot easier.

The second extension, called ckanext-openspending, shows how easy such an integration around standardised data is. The extension takes the published Budget Data Packages and automatically sends it to OpenSpending. From there OpenSpending does its own thing, analyses the data, aggregates it and makes it very easy to use for those who use OpenSpending’s visualisation library.

So thanks to a perhaps seemingly insignificant extension feature in CKAN 2.3, getting beautiful and understandable visualisations of budget spreadsheets is now only an upload to a CKAN instance away (and can only get easier as the two extensions improve).

To learn even more, see this report about the CKAN and OpenSpending integration efforts.

March 11 2015

20:18

If ‘Change’ had a favourite number…it would be 2.3

There’s something about the number 2.3. It just rolls off the tongue with such an easy rectitude. Western families reportedly average 2.3 children; there were 2.3 million Americans out of work when Barrack Obama took Office; Starbucks go through 2.3 million paper cups a year. But the 2.3 that resonates with me most is 2.3 billion. That was the world population in the late 1940’s, and growing. WWII was over and we were finally able to stand up, dust off the despair of war and Depression, bask in a renewed confidence in the future, and make a lot of babies. We were on the brink of something and what those babies didn’t know yet was that they would grow up to usher in a wave of unprecedented social, economic and technological change.

We are on the brink again. Open data is gaining momentum faster than the Baby Boomers are growing old  and it has the potential to steer that wave of change in all manner of directions. We’re ready for the next 2.3. Enter CKAN 2.3.

Here are some of the most exciting updates:

  • Completely refactored resource data visualizations, allowing multiple persistent views of the same data an interface to manage and configure them. Check the updated documentation to know more, and the “Changes and deprecations” section for migration details: http://docs.ckan.org/en/ckan-2.3/maintaining/data-viewer.html

  • Responsive design for the default theme, that allows nicer rendering across different devices

  • Improved DataStore filtering and full text search capabilities

  • Added new extension points to modify the DataStore behaviour

  • Simplified two-step dataset creation process

  • Ability for users to regenerate their own API keys

  • Changes on the authentication mechanism to allow more secure set-ups. See “Changes and deprecations” section for more details and “Troubleshooting” for migration instructions.

  • Better support for custom dataset types

  • Updated documentation theme, now clearer and responsive

If you are upgrading from a previous version, make sure to check the “Changes and deprecations” section in the CHANGELOG, specially regarding the authorization configuration and data visualizations.

To install the new version, follow the relevant instructions from the documentation depending on whether you are using a package or source install: http://docs.ckan.org/en/ckan-2.3/maintaining/installing/index.html

If you are upgrading an existing CKAN instance, follow the upgrade instructions: http://docs.ckan.org/en/ckan-2.3/maintaining/upgrading/index.html

We have also made available patch releases for the 2.0.x, 2.1.x and 2.2.x versions. It is important to apply these, as they contain important security and stability fixes. Patch releases are fully backwards compatible and really easy to install: http://docs.ckan.org/en/latest/maintaining/upgrading/upgrade-package-to-patch-release.html

Charting the CKAN boom.

The following graph charts population from 1800 to 2100 but we’re interested in the period from the mid-1940s when there was a marked boost in population growth.

World population estimates from 1800 to 2100

World population estimates from 1800 to 2100. Sourced from Wikipedia: http://en.wikipedia.org/wiki/World_population The growth from 2.3 Billion in the 1940s is the Boom!

With the recent release of CKAN 2.3 we’re expecting a similar boost in community contributions. To add your voice to the community and boost the profile of the CKAN project please share a picture on twitter and include the hashtag #WeAreCKAN.

cooltext115409351606537

March 02 2015

04:58

The CKAN Association: Membership has its benefits.

The CKAN Association, established in 2014, is set to grow rapidly in 2015 with a number of initiatives now being planned to attract free tier Supporter members as well as paid members for the Gold, Silver and Bronze tiers.

The newly established Community and Communication Team (C&C Team) is recruiting members now via their Google Group at: https://groups.google.com/forum/?hl=en-GB#!forum/ckan-association-community-and-communication-team-group

The team needs your help with website updates, creative content development and community engagement. As a new team within the CKAN project they are looking for self motivated people to initially join a core team that will set the strategic communication objectives for the project and help to realise the incredible potential of the CKAN project.

If you can contribute as little as one or two hours per week then you’ll earn yourself a CKAN Association supporter badge, but that is just the start… by joining the C&C Team you’ll be in the middle of things and help to grow a worldwide community of awesomeness.

CKAN Association Badges

The following CKAN Association badges are now available. If you are already a member of the Tech Team then you can request to grab the Supporter Member badge via the C&C Team google group

Badge files and usage policy will be available on CKAN.org soon <- This is one of the todo items the C&C Team are recruiting help for!

The current list of CKAN Association members can be found here: http://ckan.org/about/members/

CKAN Association Badges

February 27 2015

12:57

AKSW Colloquium: Tommaso Soru and Martin Brümmer on Monday, March 2 at 3.00 p.m.

On Monday, 2nd of March 2015, Tommaso Soru will present ROCKER, a refinement operator approach for key discovery. Martin Brümmer will then present NIF annotation and provenance – A comparison of approaches.

Tommaso Soru – ROCKER – Abstract

As within the typical entity-relationship model, unique and composite keys are of central importance also when their concept is applied on the Linked Data paradigm. They can provide help in manifold areas, such as entity search, question answering, data integration and link discovery. However, the current state of the art does not count approaches able to scale while relying on a correct definition of key. We thus present a refinement-operator-based approach dubbed ROCKER, which has shown to scale to big datasets with respect to the run time and the memory consumption. ROCKER will be officially introduced at the 24th International Conference on World Wide Web.

Tommaso Soru, Edgard Marx, and Axel-Cyrille Ngonga Ngomo, “ROCKER – A Refinement Operator for Key Discovery”. [PDF]

Martin Brümmer - Abstract – NIF annotation and provenance – A comparison of approaches

The uptaking use of the NLP Interchange Format (NIF) reveals its shortcomings on a number of levels. One of these is tracking metadata of annotations represented in NIF – which NLP tool added which annotation with what confidence at which point in time etc.

A number of solutions to this task of annotating annotations expressed as RDF statements has been proposed over the years. The talk will weigh these solutions, namely annotation resources, reification, Open Annotation, quads and singleton properties in regard to their granularity, ease of implementation and query complexity.

The goal of the talk is presenting and comparing viable alternatives of solving the problem at hand and collecting feedback on how to proceed.

February 20 2015

10:00

SEMANTiCS2015: Calls for Research & Innovation Papers, Industry Presentations and Poster/Demos are now open!

The SEMANTiCS2015 conference comes back this year in its 11th edition where it all started in 2005 to Vienna, Austria!

The conference  takes place from 15-17 September 2015 (the main conference will be on 16-17th of September and several back 2 back workshops & events on 15th) at the University of Economics – see all information: http://semantics.cc/.

SEMANTiCS 2015 - Banner - new

We are happy to announce the SEMANTiCS Open Calls as follows. All infos on the Calls can also be found on the SEMANTiCS2015 website here: http://semantics.cc/open-calls

Call for Research & Innovation Papers

The Research & Innovation track at SEMANTiCS welcomes the submission of papers on novel scientific research and/or innovations relevant to the topics of the conference. Submissions must be original and must not have been submitted for publication elsewhere. Papers should follow the ACM ICPS guidelines for formatting (http://www.acm.org/sigs/publications/proceedings-templates) and must not exceed 8 pages in lenght for full papers and 4 pages for short papers, including references and optional appendices.

Abstract Submission Deadline: May 22, 2015
Paper Submission Deadline: May 29, 2015
Notification of Acceptance: July 10, 2015
Camera-Ready Paper: July 24, 2015
Details: http://bit.ly/semantics15-research

Call for Industry & Use Case Presentations

To address the needs and interests of industry SEMANTICS presents enterprise solutions that deal with semantic processing of data and/or information in areas like like Linked Data, Data Publishing, Semantic Search, Recommendation Services, Sentiment Detection, Search Engine Add-Ons, Thesaurus and/or Ontology Management, Text Mining, Data Mining and any related fields. All submissions have a strong focus on real world applications beyond the prototypical status and demonstrate the power of semantic systems!

Submission Deadline: July 1, 2015
Notification of Acceptance: July 20, 2015
Presentation Ready: August 15, 2015
Details: http://bit.ly/semantics15-industry

Call for Posters and Demos

The Posters & Demonstrations Track invites innovative work in progress, late-breaking research and innovation results, and smaller contributions (including pieces of code) in all fields related to the broadly understood Semantic Web. The informal setting of the Posters & Demonstrations Track encourages participants to present innovations to business users and find new partners or clients.  In addition to the business stream, SEMANTiCS 2015 welcomes developer-oriented posters and demos to the new technical stream.

Submission Deadline: June 17, 2015
Notification of Acceptance: July 10, 2015
Camera-Ready Paper: August 01, 2015
Details: http://bit.ly/semantics15-poster

We are looking forward to receive your submissions for SEMANTiCS2015 and see you in Vienna in autumn!

February 19 2015

21:53

AKSW Colloquium: Edgard Marx and Tommaso Soru on Monday, February 23, 3.00 p.m.

On Monday, 23rd of February 2015, Edgard Marx will introduce Smart, a search engine designed over the Semantic Search paradigm; subsequently, Tommaso Soru will present ROCKER, a refinement operator approach for key discovery.

EDIT: Tommaso Soru’s presentation was moved to March 2nd.

Abstract – Smart

Since the conception of the Web, search engines play a key role in making content available. However, retrieving of the desire information is still significantly challenging. Semantic Search systems are a natural evolution of the traditional search engines. They promise more accurate interpretation by understanding the contextual meaning of the user query. In this talk, we will introduce our audience to Smart, a search engine designed over the Semantic Search paradigm. Smart incorporates two of our currently designed approaches of dealing with the problem of Information Retrieval, as well as a novel interface paradigm. Moreover, we will present some of the former, as well as more recent state-of-the-art approaches used by the industry – for instance by Yahoo!, Google and Facebook.

Abstract – ROCKER

As within the typical entity-relationship model, unique and composite keys are of central importance also when their concept is applied on the Linked Data paradigm. They can provide help in manifold areas, such as entity search, question answering, data integration and link discovery. However, the current state of the art does not count approaches able to scale while relying on a correct definition of key. We thus present a refinement-operator-based approach dubbed ROCKER, which has shown to scale to big datasets with respect to the run time and the memory consumption. ROCKER will be officially introduced at the 24th International Conference on World Wide Web.

Tommaso Soru, Edgard Marx, and Axel-Cyrille Ngonga Ngomo, “ROCKER – A Refinement Operator for Key Discovery”. [PDF]

February 18 2015

13:04

Data to Value & Semantic Web Company agree partnership to bring cutting edge Semantic Management to Financial Services clients

The partnership aims to change the way organisations, particularly within Financial Services, manage the semantics embedded in their data landscapes. This will offer several core benefits to existing and prospective clients including locating, contextualising and understanding the meaning and content of Information faster and at a considerably lower cost. The partnership will achieve this through combining the latest Information Management and Semantic techniques including:

  • Text Mining, Tagging, Entity Definition & Extraction.
  • Business Glossary, Data Dictionary & Data Governance techniques.
  • Taxonomy, Data Model and Ontology development.
  • Linked Data & Semantic Web analyses.
  • Data Profiling, Mining & Discovery.

This includes improving regulatory compliance in areas such as BCBS, enabling new investment research and client reporting techniques as well as general efficiency drivers such as faster integration of mergers and acquisitions. As part of the partnership, Data to Value Ltd. will offer solution services and training in PoolParty product offerings, including ontology development and data modeling services.

Nigel Higgs, Managing Director of Data to Value notes; “this is an exciting collaboration between two firms which are pushing the boundaries in the way Data, Information and Semantics are managed by business stakeholders. We spend a great deal of time helping organisations at a grass roots level pragmatically adopt the latest Information Management techniques. We see this partnership as an excellent way for us to help organisations take realistic steps to adopting the latest semantic techniques.”

Andreas Blumauer, CEO of Semantic Web Company adds, “The consortium of our two companies offers a unique bundle, which consists of a world-class semantic platform and a team of experts who know exactly how Semantics can help to increase the efficiency and reliability of knowledge intensive business processes in the financial industry.”

February 17 2015

14:38

Call for Feedback on LIDER Roadmap

The LIDER project is gathering feedback on a roadmap for the use of Linguistic Linked Data for content analytics.  We invite you to give feedback in the following ways:

Excerpt from the roadmap

Full document: available here
Summary slides: available here

Content is growing at an impressive, exponential rate. Exabytes of new data are created every single day. In fact, data has been recently referred to as the “oil” of the new economy, where the new economy is understood as “a new way of organizing and managing economic activity based on the new opportunities that the Internet provided for businesses” .

Content analytics, i.e. the ability to process and generate insights from existing content, plays and will continue to play a crucial role for enterprises and organizations that seek to generate value from data, e.g. in order to inform decision and policy making.

As corroborated by many analysts, substantial investments in technology, partnerships and research are required to reach an ecosystem consisting of many players and technological solutions that provide the necessary infrastructure, expertise and human resources required to make sure that organizations can effectively deploy content analytics solutions at large scale in order to generate relevant insights that support policy and decision making, or even to define completely new business models in a data-driven economy.

Assuming that such investments need to be and will be made, this roadmap explores the role that linked data and semantic technologies can and will play in the field of content analytics and will generate a set of recommendations for organizations, funders and researchers on which technologies to invest as a basis to prioritize their investment in R&D as well as on optimizing their mid- and long-term strategies and roadmaps.

Conference Call on 19th of February 3 p.m. CET

Connection details: https://www.w3.org/community/ld4lt/wiki/Main_Page#LD4LT_calls
Summary slides: available here

Agenda

  1. Introduction to the LIDER Roadmap (Philipp Cimiano, 10 minutes)
  2. Discussion of Global Customer Engagement Use Cases (All, 10 minutes)
  3. Discussion of Public Sector and Civil Society Use Cases (All, 10 minutes)
  4. Discussion of Linked Data Life Cycle and Linguistic Linked Data Value Chain (All, 10 minutes)
  5. General Discussion on further use cases, items in the roadmap etc. (20 minutes)

In addition, the call will briefly discuss progress of meta-share linked data metadata model.

The call is open to the public, no LD4LT group participation is required. Dial-in information is available. Please spread this information widely. No knowledge about linguistic linked data is required. We especially are interested in feedback from potential users of linguistic linked data.

About the LIDER Project

Website: http://lider-project.eu

The project’s mission is to provide the basis for the creation of a Linguistic Linked Data cloud that can support content analytics tasks of unstructured multilingual cross-media content. By achieving this goal, LIDER will impact on the ease and efficiency with which Linguistic Linked Data will be exploited in content analytics processes.

We aim at providing an ecosystem for the establishment of a new Linked Open Data (LOD) based ecosystem of free, interlinked, and semantically interoperable language resources (corpora, dictionaries, lexical and syntactic metadata, etc.) and media resources (image, video, etc. metadata) that will allow for free and open exploitation of such resources in multilingual, cross-media content analytics across the EU and beyond, with specific use cases in industries related to social media, financial services, localization, and other multimedia content providers and consumers.

Take a personal interview to include your voice into the roadmap

Contact: http://lider-project.eu/?q=content/contact-us

The EU project LIDER has been tasked by the European Commission to put together a roadmap for future R&D funding in multilingual industries such as content and knowledge localization, multilingual terminology and taxonomy management, cross-border business intelligence, etc. As a leading supplier of solutions in one or more of these industries, we would need your input for this roadmap. We would like to conduct a short interview with you to establish your views on current and developing R&D efforts in multilingual and semantic technologies that will likely play an increasing role in these industries, such as Linked Data and related standards for web-based, multilingual data processing. The interview will cover the below 5 questions and will not take more than 30 minutes. Please let us know on a suitable time and date.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.