Han et al. Zheng et al. IoT devices integrate the PMS sensor. The PMS sensor uses the principle of laser light scattering to measure the particle concentration between 0. The velocity of light changes when it passes through a particle, which results in the light being deflected. The light disperses in specific directions according to the particle diameter.
The CMAQ model is a comprehensive multipollutant air quality modeling system developed and maintained by the U. For air quality forecasting, this study focused on determining the boundary and initial conditions because they considerably affect the simulation results. Many results of the global air quality model were selected, and the results from the Model for OZone and Related chemical Tracers version 4.
The boundary and initial air quality conditions were extracted from the h forecast results obtained with the MOZART model. Meteorological input data were extracted from previous forecasting WRF. Diagram of air quality forecasting. The model estimated the primary PM 2. The outputs of the model were used to estimate PM 2. Equations for computing PM 2. Overall, the commonly used vehicle type was motorbikes, which accounted for Cars and trucks accounted for Proportion of different vehicle types on Dien Bien Phu street.
According to the data in this figure, 6 a. The peak number of vehicles varied from approximately , to just under , The peaks occurred at 7 a. The traffic decreased after 10 p. The traffic continually dropped and stayed low until 4 a. Average number of vehicles per hour on Dien Bien Phu street. The data were uniform in units g km —1 , which facilitated calculation Tables 3 and 4. From the traffic flow data, the discharge load was calculated and used as input data for the CMAQ model.
The discharge load was calculated from the survey data for the industrial zones Table 5. Simulations were conducted at 1 a. The statistical results indicated that the forecasted wind speed for the So Sao station was more accurate than that for the Bien Hoa and Tan Son Hoa stations.
Due to limitations in measuring equipment, the observed wind speed was often rounded e. Each main wind direction had a range of Furthermore, the simulation data did not include the elevation data of buildings in the city, which was also a cause of the difference in results. The ME index indicated that the predicted temperature was lower than the actual temperature, and the MAE index indicated that the MAE between the forecasted and observed temperatures was approximately 0.
In addition, the values of RMSE, which were higher than the MAE index, indicated different fluctuations for different time periods in the forecasted results; however, this fluctuation may not be significant.
The simulation results from the WRF model were combined with the actual measured data to apply the post-calibrated method by determining the F bias distribution of the bias ME index with the wind speed and temperature results of the WRF model. The corrected result was obtained by subtracting the WRF results from the F bias. The F bias distributions were determined separately for each monitoring station.
The forecasted wind speed results were close to the actual measurements, and the MAE index was approximately 0. The following observations were made:. Concentration of PM 2. Dispersion graph of PM 2. The time period used for the calibration process was 3 days, July 12—14, Fig. The time used for verification was 2 days, July 15—17, Fig.
The simulation results during calibration and validation exhibited high consistency with the measured values, as indicated by correlation coefficient R 2 values of 0. The simulation results described the trend of the PM 2. Calibration values on July 12, 13, and 14, Verification values on July 15, 16, and 17, The maximum simulation value of the PM 2.
This trend was consistent with the current air condition of HCMC. However, the spatial distribution had many unreasonable spots. This could be explained by the fact that the emission data could not reflect the immediate and volatile characteristics of industrial and real-time traffic.
The results also indicated that the PM 2. In the morning, the PM 2. In the afternoon, the concentration reduced marginally to However, the trend of increasing concentration in the afternoon was unstable, which can be explained by the fact that in the rainy season, the PM 2. In another research, the annual average concentrations of PM 2.
Exposure to high-level concentrations of PM 2. The increase in PM 2. Therefore, forecasting the level of PM 2. Several pieces of research work on air quality prediction such as PM 2. Delavar et al. Using machine learning regression models to predict PM 2. In addition to the meteorological and PM 2. These predictive results are also significant in health studies due to the impact of PM 2. There are currently very few studies predicting PM 2.
Information to warn people about PM 2. This study will propose a simple, fast, and accurate machine learning method to predict PM 2. The research results are new to be able to predict PM 2.
This result is the first step for the following projects on pollution forecasting, building an application to warn people, and proposing options to reduce HCM City pollution. According to the evaluation of existing studies in Vietnam, this will be a new study using machine learning and the WRF model to predict pollution in HCM City.
HCM City's geographic coordinates are between The case study area is the center of HCM City The dataset used in this study includes PM 2. Meteorological data comprise temperature, relative humidity, wind direction, and wind speed. In addition, hourly averaged PM 2. S , Data used for the machine learning experiment are hourly data for five years, from January 1 st , , to December 31 st , , for training and from January 1 st , , to December 31 st , , for the testing model. Study area and data sites.
With wide application, WRF has become a community model, bringing benefits and the contribution of the number of users in the world. In this study, the WRF model version 4. The model configuration with three nesting domains are shown in Fig.
Domain 3, with a 1. WRF model domains. All three domains have the set up with the same dynamic and physical parameterization configurations. Table 1 presents a brief of configuration parameter schemes for all three domains in the WRF model.
GFS's weather forecasting model generates datasets with many atmospheric and land-soil variables, including temperatures, winds, precipitation, soil moisture, and atmospheric ozone concentration. The GFS forecast data has a resolution of 0. This dataset is run four times daily at 00z, 06z, 12z, and 18z with a 3-hour temporal resolution 3-hourly. The distribution of PM 2.
A simulation of the historical period was performed to evaluate the WRF model's applicability to generate meteorological data for the PM 2. Meteorological simulation data for 2 months representing the rainy season September and dry season January are used to run a machine learning model to predict PM 2.
The WRF model then runs a forecast of future meteorological data April and May to input a machine learning model that predicts PM 2. Predictions are made with short-term 24 hours, 48 hours, 72 hours and long-term 7 days respectively for analysis and evaluation. Machine learning models are run with several different algorithms to evaluate the efficiency and choose the best predictive algorithm. In this study, the machine learning algorithms are run in Python version 3. Furthermore, these six machine learning algorithms are all regression models, which means that the results give a specific predictive value.
In this study, the input data of the PM 2. The dataset used for the machine learning model is five years, divided into two parts, with one part having four years for the training period and the other part one year for the testing period. The flow diagram in Fig. The first step is to separate the preprocessing data into two data sets, including the training and test sets.
Second, train the machine learning model with each algorithm by the training dataset. The next step is to place the test set to check the training efficiency of each algorithm. The final step is to evaluate the performance of each model through the evaluation parameters details in section 2. Machine learning flow diagram using six algorithms. R 2 is the measure of the variance in the observation variable that can be predicted using the predictor variable.
R 2 for machine learning models with one independent variable can be calculated as below:. RMSE tells how concentration the data is around the line of best fit. RMSE is commonly used as a standard statistical metric to measure model performance or predict in meteorology, air quality, and climate research studies Chai and Draxler, The formula is:. Mean Absolute Error MAE is another helpful measure widely used in machine learning model evaluations.
Therefore, dimensioned evaluations and inter-comparisons of average model-performance error should be based on MAE Willmott and Matsuura, MAE is calculated according to:. MAPE is the mean or average of the absolute percentage errors of forecasts and is defined as actual or observed value minus the forecasted value Swamidass, MAPE in the machine learning model is the most common measure used to predict errors and finding the best model de Myttenaere et al.
The following formula:. The statistical indicators evaluate the predictive performance of PM 2. Besides, the observed data is the hourly average, so the error of the forecasting model can be much higher than the reality.
A confusion matrix is an excellent option for reporting results in the performance of a classification model because it is possible to observe the relations between the classifier outputs and the true ones Diez, The information in the confusion matrix can be used to determine the accuracy of the predictive model. This study can optimize the model evaluation by a confusion matrix based on the U.
EPA, The forecast results are classified based on the U. EPA's PM 2. In this study, the calculation and presentation of the confusion matrix to evaluate the performance for two cases, including the selected predictive machine learning model and the results of running the model with various types of meteorological data. For the machine learning model: use the confusion matrix to evaluate the predictive results of the testing period.
From the results of the confusion matrix analysis, evaluate the model performance. For the results of running the model with two meteorological data sets: The matrices confuse the observed value with the forecast from the model using two types of meteorological data observation and simulation by WRF , respectively.
Analyze the confusion matrix and conclude the model's effectiveness when using the input meteorological data simulated by the WRF model. The hourly meteorological and PM 2. Compare the prediction results of the six models with the test data set with observations to evaluate the model's performance.
The results of the two evaluation methods, including statistical errors and confusion matrix, are presented in the following. The first is to present the results of statistical evaluation of the performance of machine learning models. Following are the statistical evaluation results for the performance of the machine learning models. The performance of the different models is listed in Table 3 , and a scatter chart of predicted and observed results for each model is shown in Fig.
The blue line indicates the fitted simple regression line on scattering points for models Figs. The slope of the regression equation in all models is positive, less than 1, and the residual interval is positive. First, with the positive slope value less than 1, the model will give lower forecast results than the actual observed, especially at the points where the higher concentration is suddenly.
The study of Gupta and Christopher also concluded that the regression equation of the machine learning model has a positive slope value less than 1, the results of the predictive model are lower than those observed at the station Gupta and Christopher, The scatter becomes sporadic when the observed or predicted values are overestimated or underestimated.
These results explain the trend of models' prediction: lower for high observed values and higher for low observed values. This explanation is very consistent with the future forecast results when the concentration of PM 2. Scatter and fitted plots of predicted and observed values of different models. R 2 always range between 0 and 1 and is the direct indicator in term of model performance.
In Table 3 , the results R 2 of the six models reach the value from 0. The highest R 2 value earned 0. Both errors are negatively-oriented scores, meaning the lower values are, the better. This study aims to be able to predict PM 2. Compared with some results in other studies, such as Karimian et al. Another study by Brent Lagesse et al.
However, with the prediction of PM 2. In the study of Karimian et al. CC licenses prior to Version 4. Marking guide. The license prohibits application of effective technological measures, defined with reference to Article 11 of the WIPO Copyright Treaty.
The rights of users under exceptions and limitations, such as fair use and fair dealing, are not affected by the CC licenses. Skip to content. Creative Commons. Attribution 3. This is a human-readable summary of and not a substitute for the license. You are free to: Share — copy and redistribute the material in any medium or format Adapt — remix, transform, and build upon the material for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms. Under the following terms: Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made. No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
0コメント