compare
A collection of utilities for canonicalizing and inspecting graphs.
Among other things, they solve of the problem of deterministic bnode comparisons.
Warning: the time to canonicalize bnodes may increase exponentially on degenerate larger graphs. Use with care!
Example of comparing two graphs:
>>> g1 = Graph().parse(format='n3', data='''
... @prefix : <http://example.org/ns#> .
... <http://example.org> :rel
... <http://example.org/same>,
... [ :label "Same" ],
... <http://example.org/a>,
... [ :label "A" ] .
... ''')
>>> g2 = Graph().parse(format='n3', data='''
... @prefix : <http://example.org/ns#> .
... <http://example.org> :rel
... <http://example.org/same>,
... [ :label "Same" ],
... <http://example.org/b>,
... [ :label "B" ] .
... ''')
>>>
>>> iso1 = to_isomorphic(g1)
>>> iso2 = to_isomorphic(g2)
These are not isomorphic
Diff the two graphs:
Present in both:
>>> def dump_nt_sorted(g):
... for l in sorted(g.serialize(format='nt').splitlines()):
... if l: print(l.decode('ascii'))
>>> dump_nt_sorted(in_both) #doctest: +SKIP
<http://example.org>
<http://example.org/ns#rel> <http://example.org/same> .
<http://example.org>
<http://example.org/ns#rel> _:cbcaabaaba17fecbc304a64f8edee4335e .
_:cbcaabaaba17fecbc304a64f8edee4335e
<http://example.org/ns#label> "Same" .
Only in first:
>>> dump_nt_sorted(in_first) #doctest: +SKIP
<http://example.org>
<http://example.org/ns#rel> <http://example.org/a> .
<http://example.org>
<http://example.org/ns#rel> _:cb124e4c6da0579f810c0ffe4eff485bd9 .
_:cb124e4c6da0579f810c0ffe4eff485bd9
<http://example.org/ns#label> "A" .
Only in second:
>>> dump_nt_sorted(in_second) #doctest: +SKIP
<http://example.org>
<http://example.org/ns#rel> <http://example.org/b> .
<http://example.org>
<http://example.org/ns#rel> _:cb558f30e21ddfc05ca53108348338ade8 .
_:cb558f30e21ddfc05ca53108348338ade8
<http://example.org/ns#label> "B" .
Classes:
-
IsomorphicGraph–An implementation of the RGDA1 graph digest algorithm.
Functions:
-
graph_diff–Returns three sets of triples: “in both”, “in first” and “in second”.
-
isomorphic–Compare graph for equality.
-
similar–Checks if the two graphs are “similar”.
-
to_canonical_graph–Creates a canonical, read-only graph.
-
to_isomorphic–
__all__
module-attribute
__all__ = ['IsomorphicGraph', 'to_isomorphic', 'isomorphic', 'to_canonical_graph', 'graph_diff', 'similar']
Color
Color(nodes: List[IdentifiedNode], hashfunc: HashFunc, color: ColorItemTuple = (), hash_cache: HashCache = None)
Methods:
-
__str__– -
copy– -
discrete– -
distinguish– -
hash_color– -
key–
Attributes:
Source code in rdflib/compare.py
__str__
copy
discrete
distinguish
Source code in rdflib/compare.py
hash_color
hash_color(color: Optional[Tuple[ColorItem, ...]] = None) -> str
Source code in rdflib/compare.py
IsomorphicGraph
Bases: ConjunctiveGraph
An implementation of the RGDA1 graph digest algorithm.
An implementation of RGDA1 (publication below), a combination of Sayers & Karp’s graph digest algorithm using sum and SHA-256 http://www.hpl.hp.com/techreports/2003/HPL-2003-235R1.pdf and traces http://pallini.di.uniroma1.it, an average case polynomial time algorithm for graph canonicalization.
McCusker, J. P. (2015). WebSig: A Digital Signature Framework for the Web. Rensselaer Polytechnic Institute, Troy, NY. http://gradworks.umi.com/3727015.pdf
Methods:
-
__eq__–Graph isomorphism testing.
-
__hash__– -
__ne__–Negative graph isomorphism testing.
-
graph_digest–Synonym for IsomorphicGraph.internal_hash.
-
internal_hash–This is defined instead of
__hash__to avoid a circular recursion
Source code in rdflib/compare.py
__eq__
__hash__
__ne__
graph_digest
internal_hash
This is defined instead of __hash__ to avoid a circular recursion
scenario with the Memory store for rdflib which requires a hash lookup
in order to return a generator of triples.
Source code in rdflib/compare.py
graph_diff
Returns three sets of triples: “in both”, “in first” and “in second”.
Source code in rdflib/compare.py
isomorphic
Compare graph for equality.
Uses an algorithm to compute unique hashes which takes bnodes into account.
Example
>>> g1 = Graph().parse(format='n3', data='''
... @prefix : <http://example.org/ns#> .
... <http://example.org> :rel <http://example.org/a> .
... <http://example.org> :rel <http://example.org/b> .
... <http://example.org> :rel [ :label "A bnode." ] .
... ''')
>>> g2 = Graph().parse(format='n3', data='''
... @prefix ns: <http://example.org/ns#> .
... <http://example.org> ns:rel [ ns:label "A bnode." ] .
... <http://example.org> ns:rel <http://example.org/b>,
... <http://example.org/a> .
... ''')
>>> isomorphic(g1, g2)
True
>>> g3 = Graph().parse(format='n3', data='''
... @prefix : <http://example.org/ns#> .
... <http://example.org> :rel <http://example.org/a> .
... <http://example.org> :rel <http://example.org/b> .
... <http://example.org> :rel <http://example.org/c> .
... ''')
>>> isomorphic(g1, g3)
False
Source code in rdflib/compare.py
similar
Checks if the two graphs are “similar”.
Checks if the two graphs are “similar”, by comparing sorted triples where
all bnodes have been replaced by a singular mock bnode (the
_MOCK_BNODE).
This is a much cheaper, but less reliable, alternative to the comparison
algorithm in isomorphic.
Source code in rdflib/compare.py
to_canonical_graph
to_canonical_graph(g1: Graph, stats: Optional[Stats] = None) -> ReadOnlyGraphAggregate
Creates a canonical, read-only graph.
Creates a canonical, read-only graph where all bnode id:s are based on deterministical SHA-256 checksums, correlated with the graph contents.