Like what you see? Have a play with our trial version.


Apart from the methods discussed in the previous sections, Yellowfin’s Step API, partially implemented by AbstractETLRowStep and AbstractETLCachedStep, also provides a number of helper methods. Most of these are convenience methods, however there are some others methods that are meant to be overridden. The API also provides hooks to various stages of processing. For example, row steps can override preEndRows() to do extra processing before the step finishes.

The javadoc has the complete list of methods, but the important ones are covered here. These are divided into the following categories:

 



Input/Output Methods

These methods deal with input/output steps and flows. These methods will return results only during step execution. This means they won’t return anything when called from setupGeneratedFields() or getValidatedStepOptions(). Methods such as validate() and processEndRows() are called during the step’s execution and may be safely used.

MethodDescription

public Set<String> getInputFlowUuids()

public Set<String> getOutputFlowUuids()

These methods fetch all input/output flow UUIDs. Since most steps don’t have multiple inputs or outputs, getFirstInputFlow()and getFirstOutputFlow()may be more useful.

public ETLStep getInputStep(String inFlowUuid)

public ETLStep getOutputStep(String outFlowUuid)

These methods fetch a connected input/output step based on the flow UUID.


 


Field Methods

These methods deal with the Step’s fields. This includes Default and Output Metadata Fields. Methods for operations such as adding/removing fields are available.

 

MethodDescription

public List<ETLStepMetadataFieldBean> getMetadataFields(String outFlowUuid)

This method returns the output metadata fields for a specified output flowUUID. It  does not return the default metadata fields when outFlowUUID is null.
public List<ETLStepMetadataFieldBean> getDefaultMetadataFields()This method returns all default metadata fields. The API has other convenience methods for fetching just the field UUIDs and the fields as a map, keyed by fieldUUID.
protected boolean isGeneratedField(ETLStepMetadataFieldBean field)This is a very useful method. This method returns true when the parameter is a generated default metadata field.

public ETLStepMetadataFieldBean addNewGeneratedField(ETLStepMetadataFieldBean field, String optionKey)

This creates a new default metadata field containing a linkType of NEWFIELD. It also creates a step option with the specified option key and sets its value as the field’s UUID. The method generates the fieldUUID, so it overrides whatever is set in the parameter variable. The method returns the newly built and linked field. addGeneratedField() may be used for advanced options such as setting the LinkType and linkFieldUUID. There are other convenience methods which replace existing Default Metadata Fields. See replaceDefaultField(), restoreReplacedField() and isReplacementField() in the javadoc.
public void removeDefaultMetadataField(String fieldUUID)

This method is used for removing existing default metadata fields. However, it's not very safe to use this to remove default fields from an input step, as this may irreversibly corrupt the step.

protected void excludeDefaultField(String fieldUUID)

A default metadata field may be excluded from the flow using this method. 

It prevents a Default Metadata Field from being included in the step’s output metadata fields. Data in this field will not be available to the next step. However, the data is available for internal processing and may be output in another field. For example, a text to number datatype conversion step could exclude the original text field and replace it with a converted numeric field.
protected void includeDefaultField(String fieldUUID,Integer position)

A default metadata field may be included to the flow using this method. It restores the field at the specified position (starting from 0).

public Map<String, String> getDefaultToInputFieldMap()

public Map<String, String> getInputToDefaultFieldMap()

These methods may be used to retrieve the source of a Default Metadata Field (default to input) and the target of an Input Metadata Field (input to default).

 



Configuration Methods

These methods deal with step options, specifically for adding and fetching them. The only way to remove an option is by validating it in getValidatedStepOptions(), which must be implemented in the Step.

MethodDescription
public String getStepOption(String optionKey)

This method returns a step option which is stored as a String in the Yellowfin configuration database. The API provides a few convenience methods which convert the option to a specific data type. For example, getStepOptionValueJSONObject() converts an option to a com.google.gson.JsonObject and getStepIntegerValue() converts it to an Integer.

public byte[] getFile(String optionKey)

public String getFileName(String optionKey)

public String getText(String optionKey)

Step Options which reference files and CLOBs may be retrieved using these methods. These objects are fetched from tables in Yellowfin's configuration database.


public void addStepOption(String optionKey, String optionValue)This method adds a new step option to the step’s configuration. The option key and value are saved in Yellowfin’s configuration database.

public void parseData(Map<String, Object> data)

This method is used to validate and/or convert inputs from the front-end. The step’s configuration UI is represented by a ParameterPanelCollection which is composed of Parameter objects. These are basic building blocks such as a text box or a drop-down. There may be more complicated Parameter classes such as field-match and file-input. The input data from a parameter may have to be parsed before storing it as a step option. The field-match parameter may, for example, return separate lists for matched fields and excluded fields. To save them as separate step options, parseData() must modify the corresponding entry in data and put it back. This is shown in the example below.


@Override
public void parseData(Map<String, Object> dataMap) {
    for(String key: dataMap.keySet()) {
        if ("MATCH_FIELDS".equals(key) && dataMap.get(key) != null) {
            // Fields are matched using "from" and "to" elements in the data string.
            // Convert [{from=uuid1, to=uuid2}, {from=uuid3, to=uuid4}] to
            // LEFT_FIELD0=uuid1, RIGHT_FIELD0=uuid2, 
            // LEFT_FIELD1=uuid3, RIGHT_FIELD1=uuid4
            List<?> allJoins = (List<?>) dataMap.get(key);
            int i = 0;
            for (Object joinObj : allJoins) {
                if (joinObj instanceof Map) {
                    @SuppressWarnings("unchecked")
                    Map<String, String> join = (Map<String, String>)joinObj;
                    
                    dataMap.put("LEFT_FIELD"+i, join.get("from"));
                    dataMap.put("RIGHT_FIELD"+i, join.get("to"));
                    i++;
                }
            }
        }
    }
    
    // This shouldn't be saved as a StepOption
    dataMap.remove("MATCH_FIELDS");
}

After parsing, entries which shouldn’t be saved as a step option, should be removed from the map.



 



Data Processing Methods

These methods may be used for manipulating data during the step’s execution.

 

MethodDescription
public ETLStepResult getFreshDataPacket(String outFlowUuid)Rows of data may be transmitted between steps using Data Packets. This method returns an empty packet which then needs to be populated. This is mostly used in cached steps. The framework creates data packets internally while processing a row step.

public Wire<Object, String> getWireForField(String fieldUuid)

This method fetches the wire corresponding to a field UUID. It may be an input, default or output metadata field, since they are all linked together by the framework.

After getting a wire, you can use the following methods:

  • getValue() fetches the value on the wire.
  • removeValue() fetches the value and removes it from the wire.

  • send(Object value) puts the parameter object on the wire.

protected void beginInternalTransmission(Object[] data, List<String> fields)


This method initiates the transmission of a row of data corresponding to the metadata fields specified in fields. The fields are usually input metadata fields arranged properly to match the data. However, steps which generate data (a database input step, for example) will not have input metadata fields. Default metadata fields should then be arranged properly and passed to this method.

Transmission is essential because the order of input, default and output metadata fields may all be different. Wires help link them together and transmission makes use of these precalculated mappings to send data to the correct fields from input to default to output.

protected ETLStepResult endInternalTransmission(ETLStepResult packet)

This method should be called after the step finishes processing the data transmitted by beginInternalTransmission(). Data is removed from internal wires and put into an array in an output data packet obtained using getFreshDataPacket().

Data may be retained on wires at the end of the transmission using overloaded method endInternalTransmission(ETLStepResult, boolean). This is useful in multi-output steps, where processed data should be available for another output.

protected void emitData(ETLStepResult dataPacket)

The method is used to send a processed data packet to an output step. The data packet knows which output it should go to, so there is no need to specify this. The method throws ETLException and InterruptedException. These are consumed by the framework, so any code using this method, must throw these exceptions as is, otherwise important messages to the framework will be lost.

 

 


Error Handling Methods

The API provides a few methods for error handling. Processing errors should be wrapped in instances of ETLException.

Error handling is simpler in row steps. processWireData() could throw an ETLException and it will be counted towards the configured error threshold. Error handling in a cached step is a bit more involved. Any exception thrown from processEndRows() will cause the Data Transformation process to fail. To use the error threshold, cached steps should catch processing errors and add it to the errors collection in the step.

The methods for error handling are described below:


MethodDescription 
public void throwUnhandledETLException(Throwable e)

This method throws an ETLException when the error cannot be handled. The Throwable passed with this method, will be wrapped in an ETLException, unless it is already one, in which case it will be thrown as is.

public ETLException getETLException(String message)

This is a convenience method for getting an instance of ETLException containing a specific message. The message may be a text string or an ApplicationResource key, in which case it must exist in ApplicationResources.properties.

protected void addError(ETLException e)

This method records errors during step execution. After adding an error, it checks if the step has exceeded the error threshold. An ETLStepErrorThresholdExceededException is thrown if it has. The method will mostly likely be useful while implementing a cached step. The framework handles errors for a row step, for which it is required to simply throw an ETLException.

 

 


Other Interesting Methods

There are a few other methods which are worth mentioning and might prove useful to know about.

 

MethodDescription

public Map<String, String> onCopy(Map<String, String> uuidMap)

This method is called while the step is being copied. It is intended to be used as a way for steps to copy their external dependencies.

The parameter has a mapping of oldUUID to newUUID for any UUIDs inside the step including stepUUID, fieldUUIDs, flowUUIDs, metadataUUIDs, groupUUIDs, etc. which have been copied.

The step can choose to return a map of Step Options (optionKey, optionValue) which need to be updated. This is used for steps which have complex options which are related to the copied entities. Any option keys which are returned with a null value will be deleted.

When this method is called, the entities which are set up on the step will be the old entities, not the newly copied ones.

public void onDelete()

This method is called just before the step is deleted and is intended to be used as a way for steps to clean up external dependencies.

public long getRowLimit()

This method returns the maximum number of rows which will be output by this step. This is particularly useful in Preview mode because it processes a relatively small number of rows. Although the framework will use the row limit to restrict the number of rows output from the step, it could be used to make the step efficient for Preview. Relatively slow operations such as input from a database or an external API call may be quicker and may take up less memory when limited to a smaller dataset at the source. For example, a SELECT statement could have a LIMIT.

 

 







  • No labels