Origin Data

Origins Info is a segmentation system which classifies consumers according to the part of the world from which their forebears are most likely to have originated.

Each consumer on a customer file can be placed into one of 180 different ‘Origins’ types on the basis of their personal and family names. The segmentation could be used for example to identify people on a customer file whose ancestry is most likely to be from Ireland, Italy, Albania or Myanmar.

Origins Info uses the same information to code customers on the basis of their most likely language and religion. An age estimate and gender can also be appended for most customers.

Origins is used in three different ways:

1 : Origins is used to profile customers and customer segments. By profiling customers you can identify which groups are under or over-represented on your customer file. You can find out which groups prefer to use which products, channels and outlets, which ones you are good or poor at retaining and which are responsive to which types of promotion or reward.

2 : Origins is used to code customers. By coding customers you can target campaigns to improve awareness and take up of public services by members of specific minority groups. You can also target products, such as cosmetics, media channels and travel, at audiences for whom they have be especially developed.

3 : Origins is used to classify postcodes. Using a table which identifies the dominant Origins type in each postcode you can identify the locations in which individual communities have established themselves right down to street level.

How does it work

In order to code individual customers, Origins makes use of a table which contains information on over 540,000 personal names and over 190,000 family names. Each of these names has been examined in such a way as to identify the Origins type to which it is most likely to belong. This evaluation makes use of a number of criteria including the Origins codes of the surnames held by bearers of each personal name, and vice versa; the geographical concentration of the name both within and between countries; the Mosaic codes in which the names is mostly found ; and the appearance of diagnostic letter sequences. This evaluation also establishes the confidence with which we can say a particular name belongs to a particular Origins type. Looking at the codes associated with both the personal name and the family name, and taking into account the confidence level of each, Origins identifies the Origins type to which each customer name is most likely to belong.


The level of accuracy varies from one Origins type to another. Origins Info achieves accuracy rates in excess of 90% in identifying South Asians and Muslims, and 70% in identifying Black Africans, Greeks, Armenians and people from East and South East Europe. It achieves accuracy rates of 50% with Hispanics. Lower accuracy rates are achieved with people of Nordic or French origin, with Jews and Black Caribbeans.

As would be expected the system is more accurate when coding names to a general categories, such as South Asians or Greeks or Greek Cypriots, than to specific sub-categories, such as Sri Lankans or Greek Cypriots.

Origins Info can be used to identify persons whose names come from more than one tradition – for example a person with an English personal name and a Finnish family name. The confidence score given to each name combination can also be used to select or deselect people who are most likely to be of mixed ancestry. Restricting a communication to names with high confidence scores is an effective way of avoiding communicating with individuals who are least likely to belong to the selected target group.