| name | pygraphistry-gfql |
| description | Construct and run GFQL graph queries in PyGraphistry using chain-list syntax OR Cypher strings. Covers pattern matching, hop constraints, predicates, let/DAG bindings, GRAPH constructors, and remote execution. Use when requests involve subgraph extraction, path-style matching, Cypher queries, or GPU/remote graph query workflows. |
PyGraphistry GFQL
Doc routing (local + canonical)
- First route with
../pygraphistry/references/pygraphistry-readthedocs-toc.md.
- Use
../pygraphistry/references/pygraphistry-readthedocs-top-level.tsv for section-level shortcuts.
- Only scan
../pygraphistry/references/pygraphistry-readthedocs-sitemap.xml when a needed page is missing.
- Use one batched discovery read before deep-page reads; avoid
cat * and serial micro-reads.
- In user-facing answers, prefer canonical
https://pygraphistry.readthedocs.io/en/latest/... links.
Two syntaxes, one entrypoint
g.gfql() accepts both chain-list (Python AST objects) and Cypher strings. It auto-detects the language from the argument type:
g2 = g.gfql([n({'type': 'person'}), e_forward(), n()])
g2 = g.gfql("MATCH (p:Person)-[r:KNOWS]->(q:Person) RETURN p.name, q.name")
g2 = g.gfql(query_string, language="cypher")
When to use which:
- Chain-list: Programmatic composition, dynamic parameterization, when building queries from code
- Cypher: Readability, familiarity for Cypher users, complex pattern matching with RETURN/ORDER BY/LIMIT
Quick start — chain-list
from graphistry import n, e_forward
g2 = g.gfql([
n({'type': 'person'}),
e_forward({'relation': 'transfers_to'}, min_hops=1, max_hops=3),
n({'risk': True})
])
Quick start — Cypher
g2 = g.gfql("MATCH (p:Person)-[r:KNOWS]->(q:Person) WHERE p.age > 30 RETURN p.name, q.name")
g2 = g.gfql("MATCH (a:Account)-[*1..3]->(m:Merchant) RETURN a, m")
g2 = g.gfql(
"MATCH (n) WHERE n.score > $cutoff RETURN n.id, n.score ORDER BY n.score DESC LIMIT $top_n",
params={"cutoff": 50, "top_n": 10}
)
g2 = g.gfql("MATCH (a:Person)-[:KNOWS|COLLABORATES_WITH]->(b:Person) RETURN a.name, b.name")
Cypher node labels and DataFrame columns
GFQL Cypher maps :Label to boolean columns label__<Label>, not string columns. Prefer property filters (simpler, works with any column):
g2 = g.gfql("MATCH (p) WHERE p.type = 'Person' AND p.age > 30 RETURN p.name")
nodes['label__Person'] = nodes['type'] == 'Person'
g = graphistry.edges(edges, 'src', 'dst').nodes(nodes, 'id')
g2 = g.gfql("MATCH (p:Person) WHERE p.age > 30 RETURN p.name")
Supported Cypher clauses
- Full: MATCH, WHERE, RETURN, WITH, ORDER BY, SKIP, LIMIT, DISTINCT, CALL graphistry.*, GRAPH {}, USE
- Partial: OPTIONAL MATCH (bounded subset), UNWIND (top-level), UNION/UNION ALL (direct g.gfql() only)
- Not supported: CREATE, MERGE, DELETE, SET, REMOVE (GFQL is read-only)
Cypher functions
- Scalar: labels(), type(), keys(), properties(), abs(), sqrt(), coalesce(), substring(), tointeger(), tofloat(), toboolean(), tostring()
- Aggregation: count(), sum(), min(), max(), avg(), collect(), count(DISTINCT ...)
- Operators: =, <>, <, <=, >, >=, IN, STARTS WITH, ENDS WITH, CONTAINS, IS NULL, IS NOT NULL, AND, OR, NOT
GRAPH constructor (Cypher extension)
subgraph = g.gfql("GRAPH { MATCH (a)-[r]->(b) WHERE a.risk_score > 7 }")
result = g.gfql("""
GRAPH g1 = GRAPH { MATCH (a)-[r]->(b) WHERE a.event_count > 100 }
GRAPH g2 = GRAPH { USE g1 CALL graphistry.degree.write() }
USE g2 MATCH (n) RETURN n.id, n.degree ORDER BY n.degree DESC LIMIT 10
""")
Let/DAG bindings
from graphistry import n, e_forward, let, ref
result = g.gfql(let({
'high_risk': n({'risk_score': {'$gt': 0.8}}),
'neighborhoods': ref('high_risk', [e_forward(max_hops=2), n()])
}))
result = g.gfql(let({...}), output='neighborhoods')
result = g.gfql(let({
'people': n({'type': 'person'}),
'contacts': ref('people', [e_forward({'rel': 'contacts'}), n()]),
'owned': ref('contacts', [e_forward({'rel': 'owns'}), n()])
}), output='owned')
result = g.gfql(let({
'social': let({
'people': n({'type': 'person'}),
'friends': ref('people', [e_forward({'rel': 'knows'}), n()]),
}),
'infra': let({
'servers': n({'type': 'server'}),
'traffic': ref('servers', [e_forward({'rel': 'serves'}), n()]),
}),
'combined': ref('social', [e_forward(), n()])
}), output='combined')
from graphistry import n, e_forward, let, ref, call
result = g.gfql(let({
'seeds': n({'risk_flag': True}),
'neighborhood': ref('seeds', [e_forward(max_hops=2), n()]),
}))
result = result.get_degrees().encode_point_color('degree', as_continuous=True)
- Independent bindings operate on the root graph
- ref() bindings operate on the referenced binding's output
- Nested let scope rules (requires pygraphistry >= 0.53.7):
- Inner bindings do NOT leak to outer scope
- Inner bindings CAN read outer bindings (lexical closure)
- Sibling nested lets may reuse names without collision
- Each nested let is an opaque execution unit (parallel-friendly)
Targeted patterns (high signal)
g2 = g.gfql([n(), e_forward(edge_query="type == 'replied_to' and submolt == 'X'"), n()])
from graphistry import col, compare
g2 = g.gfql([n(name='a'), e_forward(name='e'), n(name='b')], where=[compare(col('a', 'owner_id'), '==', col('b', 'owner_id'))])
g2 = g.gfql([e_forward(min_hops=2, max_hops=4, output_min_hops=3, output_max_hops=4)])
Edge direction variants
e_forward() — source-to-destination
e_reverse() — destination-to-source
e_undirected() — both directions
e() — alias for any direction
High-value patterns
g.gfql() is the unified entrypoint — pass chain-lists OR Cypher strings.
- NEVER use
.chain() or .hop() — they are deprecated and emit warnings. Always use g.gfql([...]) for chain-list syntax or g.gfql("MATCH ...") for Cypher.
- When user explicitly asks for GFQL, final snippets must include explicit
.gfql(...).
- When the task says remote execution/dataset, use
gfql_remote(...).
- Use
name= labels for intermediate matches when you need constraints.
- Use
where=[...] for cross-step/path constraints.
- Use
min_hops/max_hops and output_min_hops/output_max_hops for traversal vs returned slice.
- Use predicates (
is_in, numeric/date predicates) for concise filtering.
- Use
engine='auto' by default; force cudf/pandas only when needed.
Remote mode
rg = graphistry.bind(dataset_id='my-dataset')
res = rg.gfql_remote([n(), e_forward(), n()], engine='auto')
res = rg.gfql_remote("MATCH (n:Person)-[r]->(m) WHERE n.risk_level = 'critical' RETURN n, r, m")
res = rg.gfql_remote(let({...}))
res = rg.gfql_remote([n(), e_forward(), n()], output_type='nodes', node_col_subset=['node_id', 'time'])
res = rg.python_remote_table(lambda g: g._edges[['src', 'dst']].head(1000))
Validation and safety
- Validate user-derived query fragments before execution.
- Normalize datetime columns before temporal predicates.
- Prefer small column subsets for remote result transfer.
- Preflight Cypher:
from graphistry.compute.gfql.cypher import parse_cypher, compile_cypher
Canonical docs