Extract Flash/Articulate contents into XML – options and workflow

One of the big challenge e-learning industry faces today is how to efficiently convert the bulk of flash based e-learning courses exist in their courses inventory into HTML5 format to make them work into mobile devices such as iPad. If there are just fewer courses the developers can copy/paste, extract available information and repurpose them into HTML5 formats.

But in reality every organization ends up having many 1000s of traditional style e-learning courses and converting them mobile friendly will cost them a fortune. If you are an experienced e-learning developer you would know the value of storing and retrieving all the contents into XML formats.

Maintaining all the contents into XML gives us a lot of flexibility in changing the front end (for example, Flash to HTML/HTML5). XML is a kind of universal storage format and once you have all your contents converted into XML you would never worry about changing trends, technologies and tools. You can always find techniques to reuse and display them into various formats almost using most of the programming languages available today.

Based on my experience I am recommending few possible workflows to efficiently extract a bulk of traditional flash based e-learning courses into XML formats here.

I must mention that this post explains only a proposed workflow (not an SOP) which requires a lot of further research and development effort before the actual conversion/implementation. But the effort invested to develop a reusable workflow will pay for itself and save tremendous amount of money when extracting the contents from mass e-learning courses.


Here is the possible workflow (click to open up in large view):


Bulk/Mass Flash e-learning to HTML5 conversion workflow

Flash e-learning to into XML/HTML5 conversion workflow


I will go one step at a time.

Scenario 1 – Your existing e-learning courses are of Articulate Based and you have Source Files (the .ppt) available

Option 1 – If you have the articulate course source .ppt files you can save the slides into MS word (.doc) format. Once have all the contents consolidated into word files, a VBA developer can write macros to export them into XML. There are lots of friendly tools available in the market to export word contents into XML as well. But custom VBA programming will prove to be very effective to achieve your desired XML output structure.

Option 2 – This is probably simpler option but you would need Articulate Storyline tool for this. You can import the Articulate source files into storyline and directly export the contents into XML format from there.


Scenario 2 – Your existing e-learning courses are of Articulate Based but you don’t have the source files available

This is a complex option but you can still save a significant amount of time in extracting contents from the output files instead of manually recreating everything. However if your course is a mix of presenter, quiz maker and engage (typical articulate course) then your extraction burden will be reduced to some extent. Articulate engage and quiz maker already stores their contents into XML format. So you can simply reuse them.

But with presenter the contents stored into .swf format. So you can make use of some SWF decompile tools such as Sothink to export the .swf into new .fla format. The new .fla format stores all the information into XML already (rename the .fla to .zip and extract the contents).


Extrating text from .fla DOM xml

Extracted .fla contents like zip files


Scenario 3 – You have custom Flash courses (developed with Adobe Flash)

XML based courses – Analyze the course files. At least half of the custom flash courses would use XML to store contents. Advanced courses would even options to customize the interface/layouts. You can reuse these XMLs directly.

Non XML based courses – If the custom flash courses do not use XML to store contents then the contents would be hardcoded into Flash files itself. In this case you can use the JSFL (flash automation language) to extract contents into XML format.


Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>