About

Transcribe is a project I started during training at work when I was challenged to write something awesome.  The product that the company made often needed to have customer data imported into it during implementations and there were two very fragile, focused and out-dated ways to do so: Either by using the built-in import (which involved DBF files from the days of FoxPro and code that hadn't been dusted since the 1970s) or by using a buggy replacement that sometimes worked and sometimes didn't.

I decided that a completely new approach was needed, focusing on the generic need to get a load of data files into a database rather than contorting existing data into a specific format.

Transcribe allows a user to specify a number of files from which tables of data can be loaded (so these could be a traditional CSV file or even a table in another database), a target database, and column mappings from one table to another.  Transcribe supports three kinds of mapping: Simple mappings simply copy a source column to a target column; lookup mappings lookup a source column in a table in the database and put a resultant value in the target column; and increment mappings simply put a incrementing set of numbers into a target column starting and stepping at user defined values.

Platform: Microsoft .NET 3.5

License: You are more than welcome to use this to take over the world, just don't come crying to me when that doesn't work out for you. Install at your own risk, don't claim it's your work, and don't charge other people for it.

Download

Documentation

Installing

Installing is simply a case of following the steps in the installer.  There's nothing too complicated there.

Using

To start a new project, click the New Project button (white paper with a star top left on the top toolbar)

To add some sources, click the Add Source button (filing cabinet with a star bottom right on second toolbar), choose your source type and choose a file (or whatever else is needed)

To choose a target click the database button (yellow cylinder) on the toolbar, choose type and choose file (or whatever else is needed)

To configure the target schema's deduplication properties go Window>Target Schema.  A tree view on the left hand side will display tables, for which you can choose a custom duplication checker and columns under each table for which you can specify a score that the standard duplication checker will use in its own calculations.  Any columns that don't affect duplication should be scored as 0.  Rows appear duplicated when the sum of scores of duplicated columns is 1 or more.

To see duplicates go Window>Deduplication and click refresh.  If all is well you can choose a target table in the combo box which will display the rows target table.  Choosing a row will display duplicated rows on the right hand side with a reason as to why they appear duplicated.

To import go Window>Import.  If all is well a list of tables in the target database will appear on the left hand side, which you can select to view the data in them.  Clicking 'Make Import Go Now' will start the transfer proceeding presently.

This is an initial release and if it falls over at all it will most likely fail silently or terminate.

Extending

Transcribe should pick up third party assemblies in its program directory automatically.  If it doesn't you might want to try out the plugins folder.

There are points where you can extend Transcribe:

RowSource

A row source represents a flat file such as a CSV file.  You should inherit RowSource and override the GetData method, which should return a SourceDataTable of the data in the file and the Save method, which will write that data back as Transcribe does allow users to edit the data.

You should also create a class that implements IFactory and implement the CreateObject function which prompts the user for where to look for the file\source.  You should then decorate the RowSource class you made earlier with a RowSourceAttribute passing the type of the IFactory you made.

If your source comes straight from a file, your IFactory can inherit FileRowSourceFactory and you can decorate your RowSource with a FileRowSourceAttribute.  FileRowSourceFactory includes code to display an OpenFileDialog to the user and lets you set the filter in the attribute so everything's nice and easy for you.

TargetProvider

You can allow Transcribe to import to your own database by creating a class that inherits TargetProvider.  You should override:

  • FillLookupTable, which should fill the specified table with data from the table specified in lookupEntity, in particular resultField and lookupField.  You should also select 'Database' AS _Source_Table and 0 AS _Source_Row
  • FillTable, which should fill the specified table (use table.TableName) with data from the database, including _Source_Table and _Source_Row as above
  • GetEntities, which returns a collection of TargetEntities along with each entity's columns
  • CreateImporter, which returns a TableImporter for the specified table.  You should make a TableImporter class for your target provider that is able to take a DataTable and plonk it in the database.

If your database has an ADO.NET Data Provider, you can inherit from DbTargetProvider instead, which does much of the leg work for you.  You just have to override some obvious functions to create connections, dataadapters and generate SQL commands.

IDuplicateChecker

If you need to specify your own duplicate checking logic for a particular table, you can create a class implementing IDuplicateChecker and select it in the Target Schema window.  IDuplicateChecker exposes the IsDuplicate function that checks two datarows (that will be in the compiled dataset) and returns a DuplicateCheckResult containing information about a potential match.  Two rows are regarded as duplicates if DuplicateCheckResult.Score >= 1.

For each target table in the compiled dataset, a duplicates table is made, which relates two rows together along with a reason.

Comments

As ever, comments, crashes and suggestions through the contact form, thanks.