I need to transform large XML files that have a nested (hierarchical) structure of the form
<Root>
Flat XML
Hierarchical XML (multiple blocks, some repetitive)
Flat XML
</Root>
into a flatter ("shredded") form, with 1 block for each repetitive nested block.
The data has numerous different tags and hierarchy variations (especially in the number of tags of the shredded XML before and after the hierarchical XML), so ideally no assumption should be made about tag and attribute names, or the hierarchical level.
A top-level view of the hierarchy for just 4 levels would look something like
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
and the desired output would then be
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>A</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
<Level 1>
...
<Level 2>
...
<Level 3>
...
<Level 4>B</Level 4>
...
</Level 3>
...
</Level 2>
...
</Level 1>
That is, if at each level i
there are Li
different components, a total of Product(Li)
different components will be produced (just 2 above, since the only differentiating factor is Level 4, so L1*L2*L3*L4 = 2
).
From what I have seen around, XSLT may be the way to go, but any other solution (e.g., StAX or even JDOM) would do.
A more detailed example, using fictitious information, would be
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>2</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
</Employment>
</EmploymentHistory>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employee>
The above data should be shredded into 5 blocks (i.e., one for each different <Job>
block), each of which will leave all other tags identical and just have a single <Job>
element. So, given the 5 different <Job>
blocks in the above example, the transformed ("shredded") XML would be
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/10/2001</StartDate>
<Months>38</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/12/2004</StartDate>
<Months>6</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="US">
<Comment>List of previous jobs in the US</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Senior Developer">
<StartDate>01/06/2005</StartDate>
<Months>10</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/05/1999</StartDate>
<Months>25</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
<Employee name="A Name">
<Address>123 A Street</Address>
<Age>28</Age>
<EmploymentHistory>
<Employment country="UK">
<Comment>List of previous jobs in the UK</Comment>
<Jobs>3</Jobs>
<JobDetails>
<Job title = "Junior Developer">
<StartDate>01/07/2001</StartDate>
<Months>3</Months>
</Job>
</JobDetails>
<Available>true</Available>
<Experience unit="years">6</Experience>
</Employment>
</EmploymentHistory>
</Employee>
See Question&Answers more detail:os