I specialise in the insurance industry. I spend my days building GLM models and calculating the Technical Premium. I just started a new position in a new company and I am becoming extremely confused.
In my previous company, we used WTW Emblem software, which was not cheap but very handy to use.
My manager at the time suggested to use the so called CU line which is explained as "toggles the display of the current model unsimplified approximate, which is defined in Linear Predictor space as Current Model + Observed Average - Current Average. Where the model has a log link function this is equivalent to
Current Model * (Observed Average / Current Average). This gives an indication of the fit to the data, scaled to the model prediction. Comparing this line to the Current Model can give an indication of the mix adjusted fit of the model".
What I was looking in this CU line was a monotonous trend either going up or down.
In my current company we are using R for glm modelling and we dont have CU like as such. We are using average observed and average fitted lines.
What I am truly confused is that my new colleagues are looking into this average observed line, and they are happy to fit it into a model even if its not monotonous trend, even when there are quite a bit of data in each of the levels. Lets say, engine power as a rating factor which is shown below (an extremely simplified example). X axis is the engine power.
They would be happy to fit this variable in a way so the fitted line would closely follow the observed line.
And this is what makes me extremely confused. Are we not looking the for same monotonic trend in the observed variable to decide whether we want to fit it or not. This example if from an insurance industry, but I am sure same principles are applied in other sectors as well.
I think it would be necessary to have industry knowledge to know whether a relationship should expect to be monotonic, or else rising to falling or falling to rising, as these aren't mere numbers, but should relate to the real world in meaningful ways.
In R you can you can use splines, or the segmented packages to have the relationship modelled as one linear rising and a second linear falling about the knot at 14. I assume your colleagues would handle incorporating it in such a way.
I dont have domain knowledge, but I asked bing ( chatgpt) to speculate, I provide the text of that below.
In terms of practical advise for you you, I would encourage you to ask your colleagues, for their explanation/understanding of why the observed trend(s) might be as they are, and see if you can follow their logic. It seems you left that issue unexplored with them ?
Prompt : speculate as to why in a car insurance setting, plotting engine power on x axis and propensity for payouts on the y axis, why might a might a segmented relationship first rising then falling be present in the data ?
Response: The relationship between engine power and propensity for payouts in a car insurance setting might be segmented and first rise then fall due to several factors:
Rising Phase (Low to Medium Engine Power): Cars with low to medium engine power are typically driven by a wide range of drivers, including new, inexperienced, or risk-taking drivers. These drivers may be more likely to be involved in accidents, leading to a higher propensity for insurance payouts. Additionally, cars with medium engine power might be used more frequently for daily commuting, increasing the exposure to potential accidents.
Falling Phase (High Engine Power): Cars with high engine power are often expensive and might be owned by more experienced or cautious drivers who can afford such vehicles. These drivers might drive more safely, maintain their vehicles better, and thus be less likely to claim insurance. Furthermore, these high-powered cars might be used less frequently (e.g., luxury sports cars for weekend drives), reducing the likelihood of accidents.
--
Please note that this is a speculative explanation and the actual reasons could vary based on numerous factors such as driver demographics, vehicle usage, geographical location, and more. It's also important to remember that correlation does not imply causation, and further analysis would be needed to draw definitive conclusions.