ISYE 6501 Final Questions and Answers with complete solution
Support Vector Machine - A supervised learning, classification model. Uses extremes, or identified points in the data from which margin vectors are placed against. The hyperplane between these vectors is the classifier SVM Pros/Cons - Pros: It works really well with a clear margin of separation It is effective in high dimensional spaces. It is effective in cases where the number of dimensions is greater than the number of samples. It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient. Cons: Not good for very large data sets Not good for when the data set has more noise i.e. target classes are overlapping Doesn't directly provide probability estimates. K-nearest neighbor (K-NN) - An unsupervised classification algorithm. Looks at the X number of closest points to the new one and classifies as whichever is most common. K-nearest neighbor (K-NN) Pros/Cons - Pros: No assumptions about data Easy to understand/Interpret Varsatile Cons: Computationally expensive because algorithm stores all training data Sensitive to irrelevant features and scale of data k-fold cross validation - Validation Technique where data is divided into X number of data subsets. Each subset is then used as a for testing while the rest are used for training. The algorithm then rotates through each subset and averages the results K Fold cross Validation Pros/Cons - Pros: Validates Performance of modelCan create balance across predicted features classes Cons: Doesn't work well with time series data The aggregate scores of your model could miss some important extreme values or overpower them so theyre harder to pick up on k-means clustering - Unsupervised learning heuristic that sets x starts by assigning x number of cluster centers, then clusters all data points into each of them based on distance. The center point of each cluster is then calculated and all data points are again re clustered. Repeat process until no-data points change clusters. Ideal number of clusters can be identified via elbow diagram. k-means pros and cons - Pros: Simple to implement Scales well to large data sets Easily adaptable Cons: Choosing K manually can bias it towards initial values sensitive to outliers Grubbs Outlier Test - A formula that uses an outlier's value, the mean of the data, and the standard deviation to determine whether or not the data point is within the confidence interval for a normal distribution or should be thrown out
Escuela, estudio y materia
- Institución
- ISYE 6501
- Grado
- ISYE 6501
Información del documento
- Subido en
- 13 de noviembre de 2023
- Número de páginas
- 10
- Escrito en
- 2023/2024
- Tipo
- Examen
- Contiene
- Preguntas y respuestas
Temas
Documento también disponible en un lote