OPTIMIZED DISTRIBUTED CLOUD ARCHITECTURES FOR ENTERPRISE SCALE DATA ENGINEERING APPLICATIONS
DOI:
https://doi.org/10.65009/re52cm82Keywords:
Optimized Distributed Cloud Architecture, Enterprise-Scale Data Engineering, Latency-Aware Resource Allocation, Predictive Autoscaling, Hybrid Edge–Core–Cloud Integration, Intelligent Orchestration, and Fault-Tolerant Computing.,,Abstract
Distributed cloud computing is now an integral part of enterprise-scale data engineering with
the large-scale heterogeneous workloads requiring low latency, high throughput and resilient
execution on geographically distributed resources. The paper introduces an efficient distributed
cloud architecture that incorporates adaptive workload profiling, latency conscious resource
mapping, data sensitive placement, predictive autoscaling with learning-based demand
prediction, and the intelligent fault-containment between edge and core and multi-cloud
systems. The framework dynamically optimizes resource usage, reduces data transfer cost and
anticipates failure by hostilely avoiding failures by anomaly-conscious migration and recovery
provisions. Significant performance improvements are measured with large synthetic and real
enterprise workload traces, proving to be much higher than the current hybrid and distributed
workload architectures. The suggested system is 27-35% faster in terms of execution time, 22%
faster in terms of throughput, and 30% better in terms of overall resource consumption, and
cuts Migration overhead by a wide margin, energy consumption, and cost of operation. The
proactive resilience measures also significantly cut fault recovery time and probability of
failure. The findings suggest that the architecture presents a scalable, efficient, and enterprise
scale base of next-generation data engineering applications running in distributed cloud
environments.
References
B. Cheng, G. Solmaz, F. Cirillo, E. Kovacs, K. Terasawa, and A. Kitazawa, ‘‘FogFlow:
Easy programming of IoT services over cloud and edges for smart cities,’’ IEEE Internet
Things J., vol. 5, no. 2, pp. 696–707, Apr. 2018.
P. S. Janardhanan and P. Samuel, Launch overheads of spark applications on standalone
and hadoop YARN clusters, in Advances in Electrical and Computer Technologies, T.
Sengodan, M. Murugappan, and S. Misra, eds. Singapore: Springer, 2020, pp. 47–54.
S. Salloum, J. Z. Huang, and Y. He, Exploring and cleaning big data with random sample
data blocks, J. Big Data, vol. 6, no. 1, p. 45, 2019.
T. Z. Emara and J. Z. Huang, A distributed data management system to support large
scale data analysis, J. Syst. Softw., vol. 148, pp. 105–115, 2019.
X. Li, A. Garcia-Saavedra, X. Costa-Perez, C. J. Bernardos, C. Guimaraes, K. Antevski,
J. Mangues-Bafalluy, J. Baranda, E. Zeydan, D. Corujo, P. Iovanna, G. Landi, J. Alonso,
P. Paixao, H. Martins, M. Lorenzo, J. Ordonez-Lucena, and D. R. Lopez, ‘‘5Growth: An
end-to-end service platform for automated deployment and management of vertical
services over 5G networks,’’ IEEE Commun. Mag., vol. 59, no. 3, pp. 84–90, Mar. 2021.
Z. Ahmad, S. Duppala, R. Chowdhury, and S. Skiena, ‘‘Improved MapReduce load
balancing through distribution-dependent hash function optimization,’’ in Proc. IEEE
th Int. Conf. Parallel Distrib. Syst. (ICPADS), Hong Kong, Dec. 2020, pp. 9–18.
R. Anil, G. Capan, I. Drost-Fromm, T. Dunning, E. Friedman, T. Grant, S. Quinn, P.
Ranjan, S. Schelter, and O. ¨ Yılmazeł, Apache mahout: Machine learning on distributed
dataflow systems, J. Mach. Learn. Res., vol. 21, no. 127, pp. 1–6, 2020.
A. Banerjee, ‘‘Blockchain with IoT: Applications and use cases for a new paradigm of
supply chain driving efficiency and cost,’’ in Advances in Computers, vol. 115.
Amsterdam, The Netherlands: Elsevier, 2019, pp. 259–292.
S. Perera, V. Gupta, and W. Buckley, ‘‘Management of online server congestion using
optimal demand throttling,’’ Eur. J. Oper. Res., vol. 285, no. 1, pp. 324–342, Feb. 2020.
S. Salloum, J. Z. Huang, and Y. He, Random sample partition: A distributed data model
for big data analysis, IEEE Trans. Industr. Inform., vol. 15, no. 11, pp. 5846– 5854, 2019.
S. Salloum, J. Z. Huang, and Y. He, Random sample partition: A distributed data model
for big data analysis, IEEE Trans. Industr. Inform., vol. 15, no. 11, pp. 5846– 5854, 2019.
P. S. Janardhanan and P. Samuel, Launch overheads of spark applications on standalone
and hadoop YARN clusters, in Advances in Electrical and Computer Technologies, T.
Sengodan, M. Murugappan, and S. Misra, eds. Singapore: Springer, 2020, pp. 47–54.
E. Zeydan, O. Dedeoglu, and Y. Turk, ‘‘Experimental evaluations of TDD-based massive
MIMO deployment for mobile network operators,’’ IEEE Access, vol. 8, pp. 33202
, 2020.
B. Varghese and R. Buyya, ‘‘Next generation cloud computing: New trends and research
directions,’’ Future Gener. Comput. Syst., vol. 79, pp. 849–861, Feb. 2018.
A. Daghistani, W. G. Aref, A. Ghafoor, and A. R. Mahmood, ‘‘SWARM: Adaptive load
balancing in distributed streaming systems for big spatial data,’’ ACM Trans. Spatial
Algorithms Syst., vol. 7, no. 3, pp. 1–43, Sep. 2021.
T. Z. Emara and J. Z. Huang, Distributed data strategies to support large-scale data
analysis across geo-distributed data centers, IEEE Access, vol. 8, pp. 178526–178538,
L. Globa and N. Gvozdetska, Comprehensive energy efficient approach to workload
processing in distributed computing environment, in Proc. 2020 IEEE Int. Black Sea
Conf. Communications and Networking (BlackSeaCom), Odessa, Ukraine, 2020, pp. 1
M. Chen, Z. Yang, W. Saad, C. Yin, H. V. Poor, and S. Cui, ‘‘A joint learning and
communications framework for federated learning over wireless networks,’’ IEEE Trans.
Wireless Commun., vol. 20, no. 1, pp. 269–283, Jan. 2021.

