Member-only story
Data-housekeeping needed? Automatically rebuild your knowledge graph (Neo4j)!
I’m a big Neo4j fan and the Community Edition is wonderful for trying out new ideas. Especially with docker it’s super simple to get started. However, it comes — surprise surprise — with plenty of limitations compared to the enterprise version.
Beside the need of shutting it down for a backup, missing user right limitations and much more, I find the lacking feature of removing unused labels completely especially annoying as they accumulate over time (you can see them in the neo4j browser on the left side).
A solution is to recreate the whole graph, but how??
What’s needed to re-create the graph and how to automate this?
Basic idea
Instead of doing a full backup and loading everything (also the unused labels), the idea is to
- export “the data” (nodes and relationships)
- delete the database
- import “the data”
Sounds straight forward, right? But sadly it came with a bunch of challenges. For example, don‘t forget your unique constraints 😉
Some words on my setup
So lets start with my setup. I use the neo4j:5.20.0 docker container, organized via docker-compose. To secure it, I use traefik (but that’s an other story).
neo4j:
image: neo4j:5.20.0
environment:
- NEO4J_AUTH=neo4j/password
- NEO4J_apoc_export_file_enabled=true
- NEO4J_apoc_import_file_enabled=true
- NEO4J_apoc_import_file_use__neo4j__config=true
- NEO4J_apoc_uuid_enabled=true
- NEO4J_PLUGINS='["apoc","apoc-extended"]'
ports:
- 7474:7474
- 7687:7687
- 7473:7473
labels:
- "traefik.enable=true"
- "traefik.http.routers.neo4j.rule=Host(`${DOMAIN}`) && PathPrefix(`/browser`)"
- "traefik.http.routers.neo4j.entrypoints=websecure"
- "traefik.http.routers.neo4j.service=neo4j"
- "traefik.http.services.neo4j.loadbalancer.server.port=7474"
extra_hosts:
- "host.docker.internal:host-gateway"
volumes:
- "./neo4j/data:/data"
- "./neo4j/conf:/conf"
- "./neo4j/logs:/logs"
…