giovedì 23 giugno 2016

Parsing fixed width text with Java

...the unnecessarily complicated way.

Recently I've been asked to parse text files with fixed-width columns (see also ISAM).
The task is menial but I had two different schema and particularly large files (77 fields and about 21.000 records).
The schema were described in a table with the name and the width of each field.
Here's what I did

  1. pasted the table into Excel
  2. used a formula to cleanup the field names (removing spaces, hyphen, parentheses, whatever)
  3. used another formula to build a Java declaration for each field
  4. created a class (with Lombok) with the record structure

Here's a (short) example.

field name field width field name (Java) Java declaration
flg-store 1 flg_store @Column(width=1, index=47) private String flg_store;
Code NEW 10 code_new @Column(width=10, index=58) private String code_new;

And here's the class I built:

@Data
public class Item
{
 @Column(width=1, index=0) private String flg_store;
 @Column(width=10, index=1) private String code_new;
}

So the objects of this class will hold the data parsed from each record. The instructions for parsing are attached to each field via annotations.
Of course any other configuration method would do (property file, XML, JSON, even the Excel file itself), but I find that having the code and the schema all in the same place is very effective.
Here is the annotation:

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Target(ElementType.FIELD)
@Retention(RetentionPolicy.RUNTIME)
public @interface Column
{

 public int width();
 public int index();
 
}

Ok, so how to use this information? I found it easier to build a list of instructions for parsing, and then parse each line to build the object.

package it.digiwrite.tiis.sap.parser;

import it.digiwrite.anoto.utility.ReflectionHelper;

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;

import lombok.Data;

public class FixedWidthParser<T>
{
 
 @Data static class FieldDescriptor
 {
  int index;
  int width;
  String fieldName;
 }
 
 List<FixedWidthParser.FieldDescriptor> mapIndexFieldName;
 
 public FixedWidthParser(Class<T> what)
 {
  mapIndexFieldName = new ArrayList<FixedWidthParser.FieldDescriptor>();
  for (Field field : what.getDeclaredFields())
  {
   Column column = field.getAnnotation(Column.class);
   FixedWidthParser.FieldDescriptor fd = new FixedWidthParser.FieldDescriptor();
   fd.setFieldName(field.getName());
   fd.setIndex(column.index());
   fd.setWidth(column.width());
   mapIndexFieldName.add(column.index(), fd);
  }
 }

 public T parseLine(String line, Class<T> cla22)
 {
  try
  {
   int lastPosition = 0;
   T item = (T)cla22.newInstance();
   
   ReflectionHelper rh = new ReflectionHelper(item); // 1
   
   for (int i = 0; i<mapIndexFieldName.size(); i++)
   {
    FixedWidthParser.FieldDescriptor fd = mapIndexFieldName.get(i);
    String token = line.substring(lastPosition, lastPosition+fd.width);
    lastPosition = lastPosition+fd.width;
    rh.set(fd.fieldName, token.trim()); // 2
   }
   return item;
  }
  catch (Exception e)
  {
   throw new RuntimeException("Error parsing line: " + line);
  }
 }
 
}
NOTE:
  1. ReflectionHelper is a utility class to wrap an object; here it is used only to...
  2. ...easily set a field via reflection
  3. this code uses templates

And how do we use this?

package it.digiwrite.tiis.sap;

import static org.junit.Assert.assertEquals;
import it.digiwrite.tiis.sap.bean.Item;
import it.digiwrite.tiis.sap.parser.FixedWidthParser;

import java.io.BufferedReader;
import java.io.FileReader;

import org.junit.Test;

public class FixedWidthParserTest
{

 @Test
 public void flusso()
 {
  String line = "Y0000011111";
  FixedWidthParser<Item> fwp = new FixedWidthParser<item>(FlussoCompletoSap.class);
  FlussoCompletoSap fcs = fwp.parseLine(line, FlussoCompletoSap.class);
  assertEquals("Y", fcs.getFlg_store());
  assertEquals("0000011111", fcs.getCodenew());
 }
 
}


NOTE 2: here is the code for ReflectionHelper
package ...;

import java.lang.reflect.Field;
import java.lang.reflect.Method;

public class ReflectionHelper
{

	private Object obj; // wrapped object
    private Class klass; // wrapped object Class
    private String className; // wrapped object class name

    public ReflectionHelper(Object obj)
    {
        this.obj = obj;
        this.klass = obj.getClass();
        String className = klass.getName();
        this.className = className.substring(className.lastIndexOf("."));
    }

    public String[] getFieldsNames()
    {
        Field[] fields = klass.getDeclaredFields();
        String[] fieldsNames = new String[fields.length];
        int i = 0;
        for (Field f : fields)
        {
            fieldsNames[i++] = f.getName();
        }
        return fieldsNames;
    }

    public String getClassName()
    {
        return className;
    }

    public Class getObjectClass()
    {
        return klass;
    }

    /**
     * Gets the property "itemName" from the wrapped object "obj", equivalent to: obj.getItemName()
     * @param itemName
     * @return obj.getItemName
     * @throws Exception
     */
    public Object get(String itemName) throws Exception
    {
        Method get;
        try
        {
            itemName = capitalize(itemName);
            try
            {
                get = klass.getMethod("get" + itemName, new Class[0]);
            }
            catch (NoSuchMethodException nsme)
            {
                get = klass.getMethod("is" + itemName, new Class[0]);
            }
            return get.invoke(obj, new Object[0]);
        }
        catch (Exception e)
        {
            throw e;
        }
    }

    /**
     * 
     * @param itemName
     * @return string with the first letter uppercase (ex. "example" --> "Example")
     */
    private String capitalize(String itemName)
    {
        return itemName.substring(0, 1).toUpperCase() + itemName.substring(1);
    }

    public Class fieldType(String fieldName) throws Exception
    {
        Class x = null;
        Class clazz = klass;
        while (clazz != null && (x == null))
        {
            try
            {
                Field field = clazz.getDeclaredField(fieldName);
                x = field.getType();
            }
            catch (NoSuchFieldException nsfe)
            {
                clazz = clazz.getSuperclass();
            }
        }
        if (x == null)
        {
            throw new RuntimeException("No such method: '" + fieldName + "'");
        }
        return x;
    }

    public void set(String fieldName, Object value) throws Exception
    {
        String methodName = "set" + capitalize(fieldName);
        Class class1 = null;
        try
        {
            class1 = fieldType(fieldName);
            Method set = klass.getMethod(methodName, new Class[] { class1 });
            Object[] args = new Object[] { value };
            set.invoke(obj, args);
        }
        catch (Exception e)
        {
            throw new RuntimeException("class: " + class1 + "; method: " + methodName,e);
        }
    }

    public String executeVoidToString(String methodName)
    {
        try
        {
            Method get = klass.getMethod(methodName, new Class[0]);
            return (String) get.invoke(obj, new Object[0]);
        }
        catch (Exception e)
        {
            throw new RuntimeException(e);
        }
    }

	/**
     * "RogerRabbit" --> "roger_rabbit"
     */
    public static String camelCaseToDb(String fieldName)
    {
        if ((fieldName==null)||(fieldName.length()==0)) return fieldName;
        String charE = fieldName.substring(0, 1);
        boolean lowerCase = charE.toLowerCase().equals(charE);
        String result = charE;
        for (int i=1; i<fieldName.length(); i++)
        {
            charE=fieldName.substring(i,i+1);
            boolean isThisTheCase = charE.toLowerCase().equals(charE);
            if (lowerCase & !isThisTheCase) result+="_";
            result+=charE.toLowerCase();
            lowerCase=isThisTheCase;
        }
        return result;
    }

}

6 commenti:

Unknown ha detto...

Can you help me with ReflectionUtility class which I am not finding here?

Prasad Chaudhari ha detto...

Hi, thanks for this awesome solution, can you please help me with ReflectionUtlity source code? thanks in advance. cheers

Manrico Corazzi ha detto...

Updated the post to include Reflection Helper code.

Prasad Chaudhari ha detto...

thanks a lot

amazingwebdeveloper ha detto...

I admire the way you put light upon the illuminating aspects of the coding via the Java spring framework. I was too confused to work upon the previously learned java frameworks because I was not aware of whether it seals a good future or not. But thanks to you for dissolving my negativity and cheering me up to take up the same.
I started working on some of the freelancing projects with the brilliant platform Eiliana.com, and I am impressed.

Diamondexch ha detto...

Following the discontinuation of the Indian Premier League in 2021, there have been questions regarding how world777 fantasy gaming sites can cope without it.