I’m thinking about who owns the cache and what throughput the source of truth should be asked to support.
If the calling system owns the cache, then the fallback seems reasonable, but in your case it seems to me that the called system owns the cache. IMO, that makes the “fallback on cache miss” a leak of abstraction to the caller (why should that system know that there is a cache etc.). Additionally, the cache should not be exposed directly to a caller in this case. You would probably want to put a service endpoint in from of all this logic which the caller can issue one call to.
If the calling system owns the cache though, I think it would be interesting for it to declare what kind of scale requirement it wants the called service to support.