Flume

The transaction if even one of these channels fails to consume the events. The selector will attempt to write to the required channels first and will fail # channel selector configuration agent_.type = multiplexing agent_.header = State agent_.mapping.CA = mem-channel-1 agent_.mapping.AZ = file-channel-2 agent_.mapping.NY = mem-channel-1 file-channel-2 agent_.optional.CA = mem-channel-1 file-channel-2 agent_.mapping.AZ = file-channel-2 agent_.default = mem-channel-1 To specify optional channels forĪ header, the config parameter ‘optional’ is used in the following way: The selector also supports optional channels. Three, then it goes to mem-channel-1 which is designated as ‘default’. If the “State” header is not set or doesn’t match any of the Sent to mem-channel-1, if its “AZ” then it goes to file-channel-2 or if its The selector checks for a header called “State”. # list the sources, sinks and channels in the agent agent_foo.sources = avro-AppSrv-source1 agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2 agent_foo.channels = mem-channel-1 file-channel-2 # set channels for source agent_ = mem-channel-1 file-channel-2 # set channel for sinks agent_ = mem-channel-1 agent_ = file-channel-2 # channel selector configuration agent_.type = multiplexing agent_.header = State agent_.mapping.CA = mem-channel-1 agent_.mapping.AZ = file-channel-2 agent_.mapping.NY = mem-channel-1 file-channel-2 agent_.default = mem-channel-1 Here is an example of enabling both configuration logging and raw data logging: For most components, the log4j logging level must also be set toĭEBUG or TRACE to make event-specific logging appear in the Flume logs. To enable data logging, set the Java system property .rawdata=true Setting this in the JAVA_OPTS variable in flume-env.sh. This can either be passed on the command line or by To enable configuration-related logging, set the Java system property Must be set in addition to log4j properties.

In order to enable logging of event- and configuration-related data, some Java system properties In some situations, however, this approach is insufficient. One way to debug problems with event pipelines is to set up an additional Memory ChannelĬonnected to a Logger Sink, which will output all event data to the Flume logs. On the other hand, if the data pipeline is broken,įlume will attempt to provide clues for debugging the problem. Many production environments because this may result in leaking sensitive data or security relatedĬonfigurations, such as secret keys, to Flume log files.īy default, Flume will not log such information. Logging the raw stream of data flowing through the ingest pipeline is not desired behavior in Given this configuration file, we can start Flume as follows: Several named agents when a given Flume process is launched a flag is passed telling it which named agent to manifest. Various components, then describes their types and configuration parameters. That buffers event data in memory, and a sink that logs event data to the console. a1 has a source that listens for data on port 44444, a channel This configuration defines a single agent named a1. # nf: A single-node Flume configuration # Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 The source and sink within the given agent run asynchronously

HDFS sink) or forwards it to the Flume source of the next Flume agent (next The sink removes the eventįrom the channel and puts it into an external repository like HDFS (via Flume The event until it’s consumed by a Flume sink. The channel is a passive store that keeps

The Flume thrift protocol.When a Flume source receives an event, it Thrift Rpc Client or Thrift clients written in any language generated from A similar flow can be defined usingĪ Thrift Flume Source to receive events from a Thrift Sink or a Flume Used to receive Avro events from Avro clients or other Flume agents in the flow The external source sends events to Flume in a format that is A Flume source consumes events delivered to it by an external source like a web