A number of methods need to be implemented to get the processing logic of the step to work. Some methods are different depending on whether it is a Row Step or a Cached Step.

 

Common methods

Following are some of the common methods for defining how a step processes data.

MethodDescription & Example
public ETLStepAPIVersion getAPIVersion()

This method is used to define the version of the Yellowfin Step API to maintain step updates. This is usually the latest version in the enum ETLStepAPIVersion. The API version is used to determine compatibility.


@Override
public ETLStepAPIVersion getAPIVersion() {
    return ETLStepAPIVersion.V1;
}
public Collection<ETLException> validate()

This method performs pre-run validations. The implementation should check if any mandatory options were not set up, or if credentials are incorrect or a host is reachable. Errors should be captured in an instance of ETLException and added to the collection returned from the method. Alternatively, one could use the convenience method getInvalidConfigETLException() instead of constructing an ETLException.

 

@Override
public Collection<ETLException> validate() {
	List<ETLException> validationErrors = new ArrayList<>();
 	       
	String exampleOption = this.getStepOption("APPEND_VALUE");
	if (exampleOption == null) {
		// Add a generic message "Step not Properly Configured"
		validationErrors.add(this.getInvalidConfigETLException());
	}
 
	try {
		Integer.parseInt(exampleOption);
	} catch (NumberFormatException e) {
		ETLException ve = new ETLException(ETLElement.STEP, getUuid(), null, "Option is not a number", e);
		validationErrors.add(ve);
	}
	return validationErrors;
}
public Map<String, String> getValidatedStepOptions()

This method allows step options to be validated. Mapping of optionKey to optionValue will be saved in the Yellowfin config database. This is where one can remove invalid option values. Manipulating the mapping returned by this.getStepOptions() will have no effect.

 

@Override
public Map<String, String> getValidatedStepOptions() {
    Map<String, String> stepOptions = this.getStepOptions();
    String exampleOption = stepOptions.get("APPEND_VALUE");
    if (exampleOption == null) {
        // Remove option if the value is no longer set
        stepOptions.remove("APPEND_VALUE");
    } else {        	
        try {
            Integer.parseInt(exampleOption);
        } catch (NumberFormatException e) {
            // Remove option if the value is not an integer
            stepOptions.remove("APPEND_VALUE");
        }
    }
    // Return the map of valid options
    return stepOptions;
}


public void setupGeneratedFields()

Implement this method when the step is required to output a new field. The data in the new field may be generated using other fields. The new field may also replace an existing field or be a duplicate. Yellowfin provides convenience methods for each operation. This method is expected to create a new instance of ETLStepMetadataFieldBean or duplicate an existing field. It is important that this method runs only if fields were not set up before, or if they should be set up again. If they are to be recreated because of a change in an option, the old field must be removed. Otherwise new fields will be generated every time the step is reconfigured.

 

@Override
public void setupGeneratedFields() throws ETLException {
    if (getStepOption("NEW_FIELD") != null) {
        // The field was already set up.
        return;
    }

    ETLStepMetadataFieldBean newField = new ETLStepMetadataFieldBean();
    newField.setFieldName("Concatenated Field");
    newField.setFieldType(ETLDataType.TEXT.name());

    // The sort order is 0 based, so the new field will be at the end
    newField.setSortOrder(getDefaultMetadataFields().size());
	
    // Ensure that the new field is output from the step
    newField.setStepIncludeField(true);
    newField.setUserIncludeField(true);
	
    // This method assigns the field a new UUID and
    // adds a Step Option to help reference it elsewhere.
    this.addNewGeneratedField(newField, "NEW_FIELD");
}


The above example is for generating a new field. To duplicate a field, use the following statement:

this.addGeneratedField(newFieldBean, ETLFieldLinkType.DUPLICATE, originalFieldUUID)


To replace an existing field use:
this.replaceDefaultField(fieldToReplace)

The above line returns a new “replacement” field which links to the original field. You may modify the object, but should not change the linkFieldUUID and linkType.


Use the below code to get back the original field and delete the replacement field.

this.restoreReplacedField(replacementField)

public Integer getMinInputSteps()

public Integer getMaxInputSteps()

public Integer getMinOutputSteps()

public Integer getMaxOutputSteps()


These methods should be overridden if the step has multiple inputs or outputs. Yellowfin provides default implementations which return min/max values based on the Step Category. These values are defined in the ETLStepCategory enum element for that category.

YFLogger

Although this isn’t a method, it is common for all steps. A step can write to the Data Transformation log using the YFLogger. It should be declared as an instance variable. YFLogger is a wrapper for log4j’s Logger class.

 

private static final YFLogger log = YFLogger.getLogger(TestStep.class.getName());

 

 

 


 

Row Step Implementation 

A Row step extends the AbstractETLRowStep class. It requires the implementation of only one method, processWireData().


processWireData();

When the framework invokes processWireData(), data from each column of the current row is already on the correct Wire. Each wire is mapped to a metadata field and may be accessed using this.getWireForField(fieldUUID). Data should be retrieved from the wire, processed, and put back on the same wire or a different one.


 

Here’s a sample implementation for appending a number to a specific field.

@Override
protected boolean processWireData(List<ETLStepMetadataFieldBean> fields)
                                  throws ETLException, InterruptedException {
    	
    // The options should've been validated by the validate() method,
    // so no need for further checks here
    String appendFieldUUID = this.getStepOption("APPEND_FIELD");
    String newFieldUUID = this.getStepOption("NEW_FIELD");
    String appendValue = this.getStepOption("APPEND_VALUE");
 
    Wire<Object, String> appendFieldWire = this.getWireForField(appendFieldUUID);
    Wire<Object, String> newFieldWire = this.getWireForField(newFieldUUID);
 
    Object data = appendFieldWire.getValue();
    String newFieldData = null;
    if (data == null) {
        newFieldData = appendValue;
    } else {
        newFieldData = data.toString() + appendValue;
    }
	
    newFieldWire.send(newFieldData);
	
    return true;
}

In this example, processWireData() runs for every row of data. In a real-world implementation, objects which will not change in subsequent method calls, should be cached in member variables. For example, appendFieldUUID, newFieldUUID, appendValue, appendFieldWire and newFieldWire should be member variables, populated only when processWireData() runs for the first time.






Cached Step Implementation 

A cached step extends AbstractETLCachedStep. Only one method, processEndRows(), needs to be implemented. Cached steps usually have more than one input step. Data extraction steps are also often implemented as a cached step because the step will have no input. It generates data by executing an SQL query, for instance.


processEndRows();