dtm: (Default)
[personal profile] dtm
Or, poke yourself, or something. Whatever's appropriate.

I've got a big huge problem to solve at work and I'd like to look up what's in the literature on solving problems like it, but I'm sufficiently distant from academic computer science that I don't know what the term is for this type of problem or where to begin.

Or, as [livejournal.com profile] mizkit said, “Help me LJ-wan Kenobi! You're my only hope!

Here's the deal:

My company processes financial data: prices on stocks, options, ETFs, etc. We get data from a ton of different vendors and supply it to our clients. (The main service we provide is having people in operations 24/7 to deal with the regular flakeouts of different financial data vendors - if all our vendors were perfectly regular and reliable, it'd put a serious dent in our business)

Now, often times our clients will get the idea that they should be able to use data from different vendors, and here's where we get to the problem. Different vendors use different ways of identifying the things they're giving pricing information on. One vendor will use CUSIP, one will use SEDOL, one will use ISIN + exchange code, some use their own private symbols (e.g. "RIC - Reuters Identification Code"), etc.

Also, the things they give information about don't always line up: one vendor will carry information about prices on individual exchanges, another will track only one exchange per item, another will send us aggregated values per country (which ends up being per exchange in some small countries, but not in, say, the US), another will claim to carry only per-exchange prices but will also (or will instead) supply an aggregate value for all the New York City exchanges.

Even within one vendor you get issues where what they're providing data about isn't well-defined: for example, a vendor that normally tracks things by CUSIP may start carrying information on an issue before a CUSIP has been officially assigned - they'll assign an interim identifier that they'll drop at some specified time after they get an official CUSIP value, or whenever they feel like it.

So we have this big problem that we call the "matching problem". The problem is how do we link together all these disparate bits, and still represent the different levels on which we are getting data? (And oh yeah, how do we do this for millions of items each day in a reasonable amount of time?)

Now, there is an academic problem referred to as "the matching problem" or "the graph matching problem", but as far as I can tell that's a different problem. What academic area of expertise covers this kind of mess?

Date: 2006-11-15 02:07 pm (UTC)
From: [identity profile] joxn.livejournal.com
I understand the inputs, but I don't understand what your ideal output from this problem is. Maybe a concrete example would help me. (If your company doesn't understand it either, then you don't have a computer science problem yet.)

To be honest, this sounds like a problem the ATR industry calls "data fusion". For instance, you've got a synthetic aperture radar and an infrared sensor looking at "the same thing"; how do you align the two data sets so as to establish linkages between corresponding parts? But I'm having a hard time seeing how some of the techniques we used would help you, because I'm not sure exactly what you're trying to do. But maybe some googling around on "data fusion" might help. Maybe also "data alignment" or "manifold alignment".

September 2024

S M T W T F S
1234567
891011121314
15161718192021
22232425 262728
2930     

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 24th, 2025 01:55 am
Powered by Dreamwidth Studios