Intro and Pub/SubThis is post is going to be a little different. This time I'm going to show off a coding sample that I did for a company awhile back. The full code sample can be found here at my github. So the challenge was to create a solution that could process a remote file and use the data to answer a series of questions. Some of the things that I had to keep in mind were resource utilization, testing, ease of maintenance, and extensionability, add to be easy to add additional questions. Before I started I downloaded the files to get a feel for the data. Now I was told that the data was health care related, but was not told what the columns within the data meant, that was part of the challenge, so I had to do some additional research and infer as best I could. I found that one of the files was a CSV containing some information about medical practices within the UK. The second file, also CSV, was the prescriptions associated with each practice in a certain month. After some additional research I was able to a mapping for the columns in the second file to what they meant in the real world. Now armed with this information I was ready to start coding. As I was thinking about the structure of my code and what steps I would need to take I determined that there were two operations that I could not avoid. The first was that I had to process each file line by line in order to parse the data into something I could use in my code. The second was that I had to rerun the calculations for each question after each record was parsed. I could not just stop processing or skip certain objects since it could impact my calculations. Keeping the requirement of resource utilization in mind, one thing that I did not want to do was walk through each file line by line, parse that line, add that object to an in-memory collection, and then once I was done with the file walk the in-memory collection to do the calculations. I felt that would be bad for 2 reasons, 1) that I would be holding all of the data in-memory and 2) that would add many more operations (in addition to the two already identified) as I would have to walk the entire practice collection and the walk the associated prescriptions for each practice. With that in mind I decided that I wanted a way to do the calculations as soon as each record was done processing. My first thought was to use an event based model. I would raise an event as soon as each line was parsed. That would give me the benefit of redoing my calculations with each new object. The thing I didn't like was that I felt the event model would cause very tight coupling between the file that handled the data source and the file(s) that did the calculations. So I went looking for another answer. Finally, after some more thought and internet searching I landed on the Pub/Sub model. I chose the Pub/Sub model because it gave me the real time calculations I wanted without coupling my data source files to my calculation files. I added the Pub/Sub NuGet package to my solution (version 1.4.1 at the time) and was off to the races. Pub/Sub was super easy to implement. The package is a set of extension methods off of the base Object class so all of my entities had the publish and subscribe methods readily available to them. As you can see it was very simple to publish that an entity (practice in this example) was done being parsed. private void ParsePracticeFromStreamData(Stream downloadStream) { try { using (var stream = new StreamReader(downloadStream)) { string line = String.Empty; while (!stream.EndOfStream) { line = stream.ReadLine(); var prac = Mapper.Map<Practice>(line); this.Publish<Practice>(prac); } } } catch (Exception e) { _logger.LogException(MethodInfo.GetCurrentMethod().Name, e); } } In this method I'm taking the download stream and reading each line, using AutoMapper to map the CSV format to my entity and then Publishing the new Practice object. Since the publish sends out the practice object anybody that is subscribed to the practice will receive the practice object and can do whatever calculations they need. Subscribing to an event is just as easy. If one of the calculation files needed to handle practice data then in the constructor I would just subscribe like this. public BusyPractice() If a file did not need to do anything with the practice data it would not subscribe and then would not be notified. That's my Pub/Sub code in a nut shell.
|
AuthorWelcome to The Blind Squirrel (because even a blind squirrel occasionally finds a nut). I'm a full-stack web and mobile developer that writes about tips and tricks that I've learned in Swift, C#, Azure, F# and more. Archives
April 2018
Categories
All
|