

In Splunk Dedup this is an expected behavior and is applied to any field with high cardinality and large size. One can avoid using Splunk Dedup command on the _raw field when searching over a large volume of data, If this function is performed the data of every event in the memory will be retained which in end effects the searchability. On the other hand, the dedup command is highly flexible unlike uniq command, dedup command can be map-reduced and can be trimmed to a particular size defaulting to 1 and can be applied to n number of fields at the same point of time. The Uniq command removes any search result which is an exact duplication, so the events have to be restored in order to use it. In dedup commands, one can specify numerous fields and also has an option like consecutive, where the Dedup command removes the events with duplicate combinations of values that are consecutive in nature or keep empty that retains events which do not have the specific required field. For instance: If the user says, "| dedup host", the Dedup command focus at the host filed and keeps the first from each host. Whereas Dedup commands focus only at the specifically mentioned fields. The main functionality of uniq commands is to remove duplicated data if the entire row or the event is similar. Get in touch with Mindmajix for the definitive Splunk Training.ĭifferentiation between Uniq and Splunk Dedup commands Alternative options in Splunk Dedup, allow the users to retain events with the removal of duplicate fields or retain the events where the specified fields do not exist in the events. One can as well sort the fields in order to have a clarity on which events are being retained. With the help of Splunk Dedup, the user can exclusively specify the count of events with duplicate values, or value combinations, to retain. At the same time for real-time searches, the primary events that are received are the searched events which might not necessarily be the most recent events which took place. The events reverted by Splunk Dedup are based on search order, In the case of historical searches, the recent happenings are searched primarily. Example of Splunk Dedup command executionīy using Splunk Dedup command, the user can specify the counts of duplication with respect to events to keep either for every value of single filed or for combinations of each value among various fields.Different functions of Splunk Dedup filtering commands.Differentiation between Uniq and Splunk Dedup commands.The Splunk Dedup command will return the first key value found for that particular search keyword/field. The Dedup command in Splunk removes duplicate values from the result and displays only the most recent log for a particular incident. Similarly the row for 03:00 the last known value for the status was DOWN (which comes from the 02:00).Splunk Dedup command removes all the events that presumes an identical combination of values for all the fields the user specifies. Looking at the table we can see that for the row for 01:00, the last known value for status was UP (which comes from the 00:00). Using this assumption we can use Splunk’s “filldown” command, to fill in the missing values.įilldown looks for empty values for a particular field and updates them to be that of the last known, non-empty value for that field. We might reasonably assume that for each missing hour, the API status is the same as that of the previous hour. Since there were no hits during the missing hours, there is nothing to tell us whether our API endpoint was available or not. If the API was available at the end of the hour then the status is reported as UP and conversely, if the API was unavailable then the status is reported as DOWN. The status is the state of the API endpoint at the end of each hour. However the “status” column is still empty for these missing hours. We can see that the “missing hours” now have rows of zeroes which tells us that there were no activity during these hours rather than ambiguously not including them. We then get an updated table that looks like this:

| fillnull total_number_of_hits, successful_hits, unsuccessful_hits | timechart values(total_number_of_hits) as total_number_of_hits, values(successful_hits) as successful_hits,values(unsuccessful_hits) as unsuccessful_hits,values(status) as status span=1hr Source="test_API_data.csv" host="test_API_data" index="main" sourcetype="csv"
