Hi!
So, I have a pretty interesting challenge.
Basically, I need to make a dump of a repository, buuuuuttt this repository contains some information that I cant bring along in the dump.
For example:
Content-type A contains an email field and content-type B contain the credit card field (among others).
When I make a dump of the repository that contains the data created based on those content-types (somehow) I have to replace those fields with other email and credit card, i canāt get the real stuff.
Iād like to know if any of you guys have some idea if this is possible and how i could do this.
Examine any example dump file. All nodes are located in a hierarchy underneath /node at the root level.
Examine the content inside one of these node files that are deep in the hierarchy underneath /node, and youāll see that all these nodes are stored as machine-readable pure text, containing a JSON-like data structure.
This means that you can write a script that traverses all files inside a dump archive and that hashes or removes the sensitive data using a replace function. The regular expressions might get a little complicated, but the whole process should not be anything out of the ordinary.
For instance, letās say I want to strip away the value of the field āownerā. When I examine a content node, the data is stored on this format: ā¦{"name":"owner","type":"String","values":[{"v":"user:system:bhj"}]}ā¦
If I want to strip away user:system:bhj from the example above, my regular expression could look like this if my regex engine supports lookbehind/lookahead: (?<="name":"owner","type":"String","values":\[\{"v":").*?(?=")
Or if my regex engine does not support lookbehind/lookahead, I can use a regex capture group: "name":"owner","type":"String","values":\[\{"v":"([^"]*)"
Might be some confusion here? I believe BjĆørnar is talking about export, but you mentions dumps? Which is it? Also, afaik XP does not support dumping og single repos yet
Sometimes Iām gonna need to bring the production repository to the dev. What I need is an āautomagicallyā way to (⦠letās say) insert a filter when dumping the production repository to mask/replace/modify certain pieces of information in a node.
The closest thing that I thought of is to make a copy of the repository in the production server and process each file individually as @bhj said. And maybe create an app so the user only inputs what information should be processed.
Iām certain that this is not the best idea, but is something.
Makes perfect sense.
We have a backlog task related to supporting annotations of properties, i.e. Sensitive/privacy etc, and support obfuscation of these values when needed. However, this is not in our 2020 listā¦