FileToXML
Introduction:
This connector is used to transform a file. It is a connector allowing to transform a file of another format (CSV, PLAT, - EDIFACT...) to an XML file based using a file called « format ».
Icon:
The connector exists in the scenario panel:
Configuration:
- Configuration :
FILE_TO_XML is a transformation connector used to set up the connector, you should click on the blue button above the connector and fill in the properties displayed :
- Name : To give a name to the processing step (connector). It is very useful to give meaningful names .
- Format : The XML file allowing to transform the input or output file to XML format or vice versa. We have several existing standard formats (EDIFACT, X12..)
- FileName : To give a name to the output file of the current step. This step is very useful for the monitoring of the platform.
- Input & Output Charset : to precise the encoding for the files. The default encoding is UTF-8.
- Condition : This field allows you to add a condition to execute the processing step only if the condition is met.
- Add boundary OnError : It is checked only to add an additional treatment in the case of an error on the treatment step.
Le fichier Format:
The format file is mandatory for the connector to work. The iXPath file formats are described in XML and stored in the client environment's format directory.
The XML format represents the actual hierarchical structure of the temporary format file.
The XML output file will strictly use the tag names and structure defined in the format structure file.
In this chapter we will go through all the possible cases of the format file.
Format Edifact
Introduction:
The Edifact format allows the user to transform an edifact file to an xml file respecting the standard format. There are so many platforms that validate the structured files.
XML documents must contain one root element that is the parent of all other elements. This element should exist once in all the document. In general it is a node called "ixDOC".
Elements of the file format:
Any group of information (even those that are not delimited by characters, a line break or other) must be represented in the document (e.g. groups of EDIFACT segments, Messages, etc.).
Attributes are considered one of the important elements in the format xml file. They tell the translation processor what actions to take when reading the document. They should be defined globally in the root element or individually on each tag.
All the attributes defined in the root tag will be automatically carried over to all the tags in the format. These are therefore the default attributes. An undefined attribute will take the default value defined in the document.
Each attribute should be indicated only if it is really useful. Some attributes can slow down the processing.
List of Processing Attributes: | |
format string | Default Value : « none » Description : Defines how the data will be taken from the file
|
end string | Default Value : empty Description : Defines the character or characters delimiting a variable format. The characters " \r " (carriage return) " \n " (newline) " \t " (tab), can be used. |
length integer | Default Value : empty Description : Defines the number of characters to be taken for a structure of type "fixed". |
min integer | Default Value : 0 Description : Defines the minimum number of successive occurrences of this tag. If min="1", it means that the tag is mandatory and should exist in the output file. |
max integer | Default Value : 1 Description : Defines the maximum number of successive occurrences of this tag. The character "n" can be used to indicate that the tag could be repeated without a max limit. |
doRead boolean | Default Value : true Description : Used for reading only. Defines whether the XML tag should be written to the output XML file. Not writing a tag frees up memory and speeds up processing. It becomes essential to use this attribute to process big sized files (≈ Mo). |
skipEmpty boolean | Default Value : true Description : Used for reading only. Defines if an empty XML tag should be generated if the data is empty in the source file. |
errorLevel boolean | Default Value : false Description : When this attribute is set to "true", the translator will make a copy of the reading point. In case of error, the first characters will be returned in the error message. It is advisable to set this attribute to " true " on the records or segments. |
key boolean | Default Value : false Description : Defines if the data is a key for its parent. The data must be identical to the value inside the tag described in the format. |
dataPick boolean | Default Value : true Description : For the "fixed" format only. It defines if the data is really taken. |
emptyEnd boolean | Default Value : false (if the tag has children) - true (if the tag has no children) Description : Defines whether the end separator for a variable format should be written if the data is empty. |
lastEnd boolean | Default Value : true Description : Defines whether the end separator for a variable format should be written if the data is the last in a sequence. |
optionChar string | Default Value : empty Description : For a fixed format, this attribute defines one or more characters that will be taken beyond the size initially planned. This attribute is particularly useful to manage optional line breaks. |
Lists of Formatting Attributes: | |
minSize integer | Default Value : 0 Description : Defines the minimum size of the data to be read. |
maxSize integer | Default Value : 2099999999 Description : Defines the maximum size of the data to be read. |
numeric boolean | Default Value : false Description : Defines if the field is numeric. For the fixed format, the value will be formatted with zeros on the left. For the variable format, the value will be formatted according to the minimum size. |
relChar string | Default Value : empty Description : Defines an escape character, especially when the string to be sampled, contains a character that uses a reserved character in the format file. Example : We will use the escape character "?" to write/read the following segment NAD+BY++L?'Oreal' because ' is a character reserved in the format file and "L'Oreal" is a string that uses the ' character. |
trim integer | Default Value : true Description : Removes spaces before and after the retrieved data |
setAttribute string | Default Value : empty Description : The collected data will be stored in the named variable. This variable can then be reused in the following attributes by using the prefix "$". This allows to have a dynamic structure (like in EDIFACT where the separators are defined in the UNA). |
alignAlpha string | Default Value : left Description : Used for writing only. Defines the alignment of a character string (not numeric). Possible values are " left " and " right ". |
fillAlpha string | Default Value : « » (espace) Description : Defines the character used to complete a text whose size is less than the minimum size (defined by "minSize" or "length") |
List of Log Attributes: | |
logLevel string | Default Value : empty Description : Creates an object in the tracking (of the attribute name), for each occurrence of this tag (read). Several objects can be defined, separated by ";". It is possible to have sub-objects: "Object.SubObject". Example : « Message », « Message;Toto », « Message.Ligne » |
logLevelKey string | Default Value : empty Description : Defines the name of a tracking field used to group multiple records in a single message. This is useful for single record files. Multiple fields can be defined by separating them with ";". Example : « Message.DocNumber » |
logTracking string | Default Value : empty Description : Feeds a tracking area for the provided object type. The syntax is « Objet.NomZone ». Example : « Message.DocNumber;Message.TrackingNumber;Message.Ligne.LineID» |
Example:
You will find in this part an example of an input file with the correct format and the output file.
Input file:
UNA:+.? ' UNB+UNOC:3+30166749103:14+37630394010:14+211011:1644+0000001' UNH+1+FORMAT:A:12C:UN:EAN006' BGM+351+LIVR0211000636+9' DTM+137:20211011:102' DTM+2:20211005:102' DTM+11:20211011:102' DTM+76:20211014:102' RFF+DQ:LIVR0211000636' DTM+171:20211014:102' RFF+ON:5255H21417C3ACHE204' DTM+171:20211001:102' UNT+25+1' UNZ+1+0000001'
Format File:
<?xml version="1.0" encoding="windows-1250"?> <ixDOC format="variable" min="0" max="1" subChar=":" dataChar="+" numChar="." relChar="?" repChar=" " segChar="'" emptyEnd="true" lastEnd="false" optionChar="\n\r"> <UNA format="fixed" length="9" optionChar="\r\n" min="0" max="1" errorLevel="true"> <TAG format="fixed" key="true" length="3" min="1">UNA</TAG> <SubChar format="fixed" setAttribute="subChar" length="1"/> <DataChar format="fixed" setAttribute="dataChar" length="1"/> <NumChar format="fixed" setAttribute="numChar" length="1"/> <RelChar format="fixed" setAttribute="relChar" length="1"/> <RepChar format="fixed" setAttribute="repChar" length="1"/> <SegChar format="fixed" setAttribute="segChar" length="1" trim="false" optionChar="\r\n"/> </UNA> <INTERCHANGE format="none" min="1" max="1"> <UNB end="$segChar" lastEnd="true" emptyEnd="false" min="1" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" min="1" key="true">UNB</TAG> <S001 end="$dataChar" min="1"> <D0001 end="$subChar" min="1" minSize="4" maxSize="4"/> <D0002 end="$subChar" min="1" minSize="1" maxSize="1"/> <D0080 end="$subChar" min="0" maxSize="6"/> <D0133 end="$subChar" min="0" maxSize="3"/> </S001> <S002 end="$dataChar" min="1"> <D0004 end="$subChar" min="1" maxSize="35"/> <D0007 end="$subChar" min="0" maxSize="4"/> <D0008 end="$subChar" min="0" maxSize="35"/> <D0042 end="$subChar" min="0" maxSize="35"/> </S002> <S003 end="$dataChar" min="1"> <D0010 end="$subChar" min="1" maxSize="35"/> <D0007 end="$subChar" min="0" maxSize="4"/> <D0014 end="$subChar" min="0" maxSize="35"/> <D0046 end="$subChar" min="0" maxSize="35"/> </S003> <S004 end="$dataChar" min="1"> <D0017 end="$subChar" min="1" minSize="6" maxSize="6" numeric="true"/> <D0019 end="$subChar" min="1" minSize="4" maxSize="4" numeric="true"/> </S004> <D0020 end="$dataChar" min="1" maxSize="14"/> <S0005 end="$dataChar"> <D0022 end="$subChar" min="1" maxSize="14"/> <D0025 end="$subChar" min="0" minSize="2" maxSize="2"/> </S0005> <D0026 end="$dataChar" maxSize="14"/> <D0029 end="$dataChar" minSize="1" maxSize="1"/> <D0031 end="$dataChar" minSize="1" maxSize="1" numeric="true"/> <D0032 end="$dataChar" maxSize="35"/> <D0035 end="$dataChar" minSize="1" maxSize="1" numeric="true"/> </UNB> <MESSAGE format="none" min="0" max="n"> <UNH end="$segChar" lastEnd="true" emptyEnd="false" min="1" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">UNH</TAG> <D0062 end="$dataChar" min="1" maxSize="14"/> <S009 end="$dataChar" min="1"> <D0065 end="$subChar" min="1" maxSize="6"/> <D0052 end="$subChar" min="1" maxSize="3"/> <D0054 end="$subChar" min="1" maxSize="3"/> <D0051 end="$subChar" min="1" maxSize="2"/> <D0057 end="$subChar" min="0" maxSize="6"/> </S009> <D0068 end="$dataChar" min="0" maxSize="35"/> <S010 end="$dataChar" min="0"> <D0070 end="$subChar" min="1" maxSize="2" numeric="true"/> <D0073 end="$subChar" min="0" maxSize="1"/> </S010> </UNH> <BGM end="$segChar" lastEnd="true" emptyEnd="false" min="1" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">BGM</TAG> <C002 end="$dataChar" min="0"> <D1001 end="$subChar" min="0" maxSize="3"/> <D1131 end="$subChar" min="0" maxSize="3"/> <D3055 end="$subChar" min="0" maxSize="3"/> <D1000 end="$subChar" min="0" maxSize="35"/> </C002> <D1004 end="$dataChar" min="0" maxSize="35"/> <D1225 end="$dataChar" min="0" maxSize="3"/> <D4343 end="$dataChar" min="0" maxSize="3"/> </BGM> <DTM end="$segChar" lastEnd="true" emptyEnd="false" min="0" max="10" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">DTM</TAG> <C507 end="$dataChar" min="1"> <D2005 end="$subChar" min="1" maxSize="3"/> <D2380 end="$subChar" min="0" maxSize="35"/> <D2379 end="$subChar" min="0" maxSize="3"/> </C507> </DTM> <SG1 min="0" max="10" format="none"> <RFF end="$segChar" lastEnd="true" emptyEnd="false" min="1" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">RFF</TAG> <C506 end="$dataChar" min="1"> <D1153 end="$subChar" min="1" maxSize="3"/> <D1154 end="$subChar" min="0" maxSize="35"/> <D1156 end="$subChar" min="0" maxSize="6"/> <D4000 end="$subChar" min="0" maxSize="35"/> </C506> </RFF> <DTM end="$segChar" lastEnd="true" emptyEnd="false" min="0" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">DTM</TAG> <C507 end="$dataChar" min="1"> <D2005 end="$subChar" min="1" maxSize="3"/> <D2380 end="$subChar" min="0" maxSize="35"/> <D2379 end="$subChar" min="0" maxSize="3"/> </C507> </DTM> </SG1> <UNT end="$segChar" lastEnd="true" emptyEnd="false" min="1" max="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">UNT</TAG> <D0074 end="$dataChar" min="1" maxSize="6" numeric="true"/> <D0062 end="$dataChar" min="1" maxSize="14"/> </UNT> </MESSAGE> <UNZ end="$segChar" lastEnd="true" emptyEnd="false" min="1" optionChar="\r\n" errorLevel="true"> <TAG end="$dataChar" key="true" min="1">UNZ</TAG> <D0036 end="$dataChar" min="1" maxSize="6" numeric="true"/> <D0020 end="$dataChar" min="1" maxSize="14"/> </UNZ> </INTERCHANGE> </ixDOC>
Output File:
<?xml version="1.0" encoding="UTF-8"?> <ixDOC> <UNA> <TAG>UNA</TAG> <SubChar>:</SubChar> <DataChar>+</DataChar> <NumChar>.</NumChar> <RelChar>?</RelChar> <SegChar>'</SegChar> </UNA> <INTERCHANGE> <UNB> <TAG>UNB</TAG> <S001> <D0001>UNOC</D0001> <D0002>3</D0002> </S001> <S002> <D0004>30166749103</D0004> <D0007>14</D0007> </S002> <S003> <D0010>37630394010</D0010> <D0007>14</D0007> </S003> <S004> <D0017>211011</D0017> <D0019>1644</D0019> </S004> <D0020>0000001</D0020> </UNB> <MESSAGE> <UNH> <TAG>UNH</TAG> <D0062>1</D0062> <S009> <D0065>FORMAT</D0065> <D0052>A</D0052> <D0054>12C</D0054> <D0051>UN</D0051> <D0057>EAN006</D0057> </S009> </UNH> <BGM> <TAG>BGM</TAG> <C002> <D1001>351</D1001> </C002> <D1004>LIVR0211000636</D1004> <D1225>9</D1225> </BGM> <DTM> <TAG>DTM</TAG> <C507> <D2005>137</D2005> <D2380>20211011</D2380> <D2379>102</D2379> </C507> </DTM> <DTM> <TAG>DTM</TAG> <C507> <D2005>2</D2005> <D2380>20211005</D2380> <D2379>102</D2379> </C507> </DTM> <DTM> <TAG>DTM</TAG> <C507> <D2005>11</D2005> <D2380>20211011</D2380> <D2379>102</D2379> </C507> </DTM> <DTM> <TAG>DTM</TAG> <C507> <D2005>76</D2005> <D2380>20211014</D2380> <D2379>102</D2379> </C507> </DTM> <SG1> <RFF> <TAG>RFF</TAG> <C506> <D1153>DQ</D1153> <D1154>LIVR0211000636</D1154> </C506> </RFF> <DTM> <TAG>DTM</TAG> <C507> <D2005>171</D2005> <D2380>20211014</D2380> <D2379>102</D2379> </C507> </DTM> </SG1> <SG1> <RFF> <TAG>RFF</TAG> <C506> <D1153>ON</D1153> <D1154>5255H21417C3ACHE204</D1154> </C506> </RFF> <DTM> <TAG>DTM</TAG> <C507> <D2005>171</D2005> <D2380>20211001</D2380> <D2379>102</D2379> </C507> </DTM> </SG1> <UNT> <TAG>UNT</TAG> <D0074>25</D0074> <D0062>1</D0062> </UNT> </MESSAGE> <UNZ> <TAG>UNZ</TAG> <D0036>1</D0036> <D0020>0000001</D0020> </UNZ> </INTERCHANGE> </ixDOC>