I think machine learning boils down to 2 fundamental issues: 1) Having a model c...

I think machine learning boils down to 2 fundamental issues:

1) Having a model class that is powerful enough to represent the data.

and

2) Being able to find a "good enough" optimal solution over that model class.

In principle over-fitting can be addressed through the choice of the particular optimization criteria in (2).

Neural networks are used because they are powerful enough to represent many data sets of interest and because there exist good algorithms for finding LOCAL optima.

Unfortunately they are complex non-linear, non-convex functions and hence finding a global optimum is most likely NP-hard.

We are left then with the heuristic of hoping that the local optima we are able to find are "close enough" to the global optimum to meet our needs.

This seems to work reasonably well for certain types of problems, like image classification, where the underlying data possess a certain simplicity or "smoothness", but less well for other problems, like logic problems where good and bad solutions may not be "near by" in any sense that the local optimization algorithms are able to discover.