sparql how to group correctly this data

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

sparql how to group correctly this data

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

First of all, I didn't make a minimum example because i think that my problem can be understood without it.

Second, I didn't give you the data because i think that my problem can be solved without it. However, I'm open to give it to you if you ask.

This is my query:

select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf  ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)

{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}

This is the result:

As you see, the result contains many values for the same suggestedItem, I want to make group according to the suggestedItem and sum the values of finalSimilarity

I tried this:

select   ?item (SUM(?similarity * ?importance * ?levelImportance ) as ?finalSimilarity)  (group_concat(distinct ?x) as ?likedItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason) where
{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}
group by ?item
order by desc(?finalSimilarity)

but the result is:

this is something wrong in my way because if you look at the finalSimilarity in, the value is 1.7. However, if you sum that manually from the first query, you get 0.62 so I did something wrong,

could you help me discover it?

Please note that the two queries are the same, it is just the select statment are different

Hint

I am already able to solve it using two selects like this:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rs: <http://www.SemanticRecommender.com/rs#>
PREFIX bo: <http://www.BookOntology.com/bo#>
PREFIX :<http://www.SemanticBookOntology.com/sbo#>

select ?suggestedItem ( SUM (?finalSimilarity) as ?summedFinalSimilarity)  (group_concat(distinct strafter(str(?likedItem), "#")) as ?becauseYouHaveLikedThisItem) (group_concat(?becauseOf ; separator = " ,and ") as ?reason)
where {
select distinct (?x as ?likedItem) (?item as ?suggestedItem) ?similarity ?becauseOf  ((?similarity * ?importance * ?levelImportance) as ?finalSimilarity)
where
{
  values ?user {bo:ania}
  #the variable ?x is bound to the items the user :ania has liked.
  ?user rs:hasRated ?ratings.
  ?ratings a rs:Likes.
  ?ratings rs:aboutItem ?x.
  ?ratings rs:ratesBy   ?ratingValue.
  #level 0 class similarities
  {
    #extract all the items that are from the same class (type) as the liked items.
    #I assumed the being from the same class accounts for 50% of the similarities.
    #This value can be changed according to the test or the application domain.
    values ?classImportance {0.5} #class level
    bind (?classImportance as ?importance)
    bind( 4/7 as ?levelImportance)
  ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasClassSimilarity ?classSimilarity .
  ?classSimilarity rs:appliedOnClass ?class .
  ?classSimilarity rs:hasClassSimilarityValue ?similarity .
  ?item a ?class.
  bind (concat("it shares the same class, which is ", strafter(str(?class), "#"), ", with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  union
   #level 0 instance similarities
  {
  #extract the items that share the same value for important predicates with the already liked items..
  #I assumed that having the same instance for important predicates account for 100% of the similarities.
  #This value can be changed according to the test or the application domain.
   values ?instanceImportance {1} #instance level
   bind (?instanceImportance as ?importance)
   bind( 4/7 as ?levelImportance)
   ?x  a ?class.
  ?class rdfs:subClassOf ?mainClass .
  ?mainClass rdfs:subClassOf rs:RecommendableClass .
  ?mainClass rs:hasSimilarityConfiguration ?similarityConfiguration .
  ?similarityConfiguration rs:hasPropertySimilarity ?propertySimilarity .
  ?propertySimilarity rs:appliedOnProperty ?property .
  ?propertySimilarity rs:hasPropertySimilarityValue ?similarity .
  ?x ?property ?value .
  ?item ?property ?value .
    bind (concat("it shares ", strafter(str(?value), "#"), " for predicate ", strafter(str(?property), "#"), " with ", strafter(str(?x), "#")) as ?becauseOf)
  }
  filter (?x != ?item)
}
}
group by ?suggestedItem
order by desc(?summedFinalSimilarity)

but to me that is a stupid solution and there must be a more clever one where i can take the aggregated data using one select

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

139 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:13:30+0000

Without seeing your data, it's impossible to say, and with a query this big, it's probably not worth trying to debug the exact problem, but it's easy for this to happen if you can have duplicates (which would be easy to get, especially if you're using unions where some condition could match both parts). For instance, suppose you have data like this:

@prefix : <urn:ex:>

:x :similar [ :sim 0.10 ; :mult 2 ] ,
            [ :sim 0.12 ; :mult 1 ] ,
            [ :sim 0.12 ; :mult 1 ] ,  # yup, a duplicate
            [ :sim 0.15 ; :mult 4 ] .

Then if you run this query, you'll get four result rows:

prefix : <urn:ex:>

select ?sim ((?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

----------------
| sim  | final |
================
| 0.15 | 0.60  |
| 0.12 | 0.12  |
| 0.12 | 0.12  |
| 0.10 | 0.20  |
----------------

However, if you select distinct, you'll only see three:

select distinct ?sim ((?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

----------------
| sim  | final |
================
| 0.15 | 0.60  |
| 0.12 | 0.12  |
| 0.10 | 0.20  |
----------------

Once you start to group by and sum, those non-distinct values will both get included:

select (sum(?sim * ?mult) as ?final) {
  :x :similar [ :sim ?sim ; :mult ?mult ] .
}

---------
| final |
=========
| 1.04  |
---------

That sum is the sum of all four terms, not the three distinct ones. Even if the data doesn't have the duplicate values, the union can introduce the duplicate results:

@prefix : <urn:ex:>

:x :similar [ :sim 0.10 ; :mult 2 ] ,
            [ :sim 0.12 ; :mult 1 ] ,
            [ :sim 0.15 ; :mult 4 ] .

prefix : <urn:ex:>

select (sum(?sim * ?mult) as ?final) {
  { :x :similar [ :sim ?sim ; :mult ?mult ] }
  union
  { :x :similar [ :sim ?sim ; :mult ?mult ] }
}

---------
| final |
=========
| 1.84  |
---------

Since you found the need to use group_concat(distinct …), I wouldn't be surprised if there are duplicates of that nature.

Categories

sparql how to group correctly this data

Please note that the two queries are the same, it is just the select statment are different

Hint

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags