Custom data dump


#1

Hi!
So, I have a pretty interesting challenge.
Basically, I need to make a dump of a repository, buuuuuttt :cry: this repository contains some information that I cant bring along in the dump.

For example:

Content-type A contains an email field and content-type B contain the credit card field (among others).
When I make a dump of the repository that contains the data created based on those content-types (somehow) I have to replace those fields with other email and credit card, i can’t get the real stuff. :sob:

I’d like to know if any of you guys have some idea if this is possible and how i could do this.

Thanks ! :grinning:

Enonic version: XP 7
OS: Ubuntu 18


#2

Hi, Wally!

Examine any example dump file. All nodes are located in a hierarchy underneath /node at the root level.

Examine the content inside one of these node files that are deep in the hierarchy underneath /node, and you’ll see that all these nodes are stored as machine-readable pure text, containing a JSON-like data structure.

This means that you can write a script that traverses all files inside a dump archive and that hashes or removes the sensitive data using a replace function. The regular expressions might get a little complicated, but the whole process should not be anything out of the ordinary.


#3

For instance, let’s say I want to strip away the value of the field “owner”. When I examine a content node, the data is stored on this format:
…{"name":"owner","type":"String","values":[{"v":"user:system:bhj"}]}…

If I want to strip away user:system:bhj from the example above, my regular expression could look like this if my regex engine supports lookbehind/lookahead:
(?<="name":"owner","type":"String","values":\[\{"v":").*?(?=")

Or if my regex engine does not support lookbehind/lookahead, I can use a regex capture group:
"name":"owner","type":"String","values":\[\{"v":"([^"]*)"


#4

Might be some confusion here? I believe Bjørnar is talking about export, but you mentions dumps? Which is it? Also, afaik XP does not support dumping og single repos yet :slight_smile:


#5

Sometimes I’m gonna need to bring the production repository to the dev. What I need is an “automagically” way to (… let’s say) insert a filter when dumping the production repository to mask/replace/modify certain pieces of information in a node.
The closest thing that I thought of is to make a copy of the repository in the production server and process each file individually as @bhj said. And maybe create an app so the user only inputs what information should be processed.
I’m certain that this is not the best idea, but is something.:cry:


#6

Makes perfect sense.
We have a backlog task related to supporting annotations of properties, i.e. Sensitive/privacy etc, and support obfuscation of these values when needed. However, this is not in our 2020 list…


#7

I understand.
Do you guys have any ideia of a better way for me to achieve this for now ?